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Abstract 


The  aim  of  this  course  is  to  present  a  basic  technique  from  proof  theory, 
Gentzen’s  normalization  for  natural  deduction  systems,  and  to  discuss  some 
of  its  applications. 

By  normalization  we  mean  a  collection  of  algorithms  transforming  a  given 
derivation  into  a  certain  normal  form.  A  derivation  is  called  normal  if  it 
does  not  contain  any  detour  or  —  in  Gentzen's  terminology  —  cut ,  i.  e.  an 
application  of  an  introduction  rule  immediately  followed  by  an  application  of 
an  elimination  rule.  Such  normalization  algorithms  are  useful  because  they 
allow  to  “straighten  out^complex  derivations  and  in  this  way  extract  hidden 
information. 

We  will  treat  many  applications  which  demonstrate  this,  e.  g.  the  sub¬ 
formula  principle.  Herbrand's  theorem,  the  interpolation  theorem,  and  an 
exact  characterization  of  the  initial  cases  of  transfinite  induction  provable  in 
arithmetic. 

From  the  computer  science  point  of  view,  an  even  more  interesting  field 
of  application  for  normalization  algorithms  is  the  possiblility  to  extract  the 
constructive  content  of  a  maybe  complex  mathematical  argument.  Such  al¬ 
gorithms  can  yield  verified  programs  from  derivations  proving  that  certain 
specifications  can  be  fulfilled.  Of  course,  the  feasability  of  programs  obtained 
in  this  way  will  depend  to  a  large  extent  on  a  good  choice  of  the  derivation, 
which  should  be  done  on  the  basis  of  a  good  idea  for  an  algorithm.  However, 
in  this  approach  it  is  possible  to  use  ordinary  mathematical  machinery  for 
the  development  of  programs.  j  _ _ 

Limitations  in  time  have  kept  me  from  including  more  material  into  these 
notes.  In  particular,  I  have  left  out  the  subject  of  codes  for  infinitary  de¬ 
rivations  and  their  use  to  extract  bounds  from  proofs  of  classical  existence 
theorems  (cf.  [Sch77b,Sch86]).  I  also  had  to  leave  out  a  planned  chapter  on 
normalization  for  type  theory;  part  of  what  I  wanted  to  say  is  contained  in 
[Sch88] . 
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Chapter  1 

Normalization  for 
propositional  logic 


We  begin  with  the  simplest  possible  logical  system,  the  implieational  frag¬ 
ment  of  propositional  calculus.  Only  two  logical  rules  are  present,  allowing 
introduction  and  elimination  of  implication1.  By  the  so-called  Curry-Howard 
isomorphism  this  system  is  the  same  as  the  pure  typed  A-calculus,  and  we 
shall  make  extensive  use  of  that  fact. 

The  restriction  to  the  implieational  fragment  of  propositional  logic  is  not 
essential  since  it  is  well-known  that  full  classical  propositional  logic  can  be 
embedded  in  it.  In  Section  1.1  we  will  discuss  this  embedding. 


1.1  Pure  implieational  logic  as  a  typed  A- 
calculus 

Formulas  are  built  up  from  propositional  variables  denoted  by  P,  Q  by  means 
of  (A  —i ►  B ).  2  Derivations  are  built  up  from  assumption  variables  x A,yB  by 
means  of  the  rule  — *+  of  implication  introduction  (or  A-abstraction) 

(XxArB)A~"B 

and  the  rule  — of  implication  elimination  (or  application) 

(tA~BsA)B. 

lThis  means  that  we  treat  so-called  minimal  logic,  formulated  in  Gentzen’s  natural 
deduction  style. 

2  We  have  chosen  the  unary  implication  A  —  B  instead  of  Ai,...,Am  —  B  just  for 
notational  convenience. 
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A  derivation  rB  whose  free  assumption  variables  are  xf' is  also 
called  a  derivation  of  B  from  Ai, . . . ,  Am.  For  readability  we  often  leave  out 
formula  superscripts  when  they  are  obvious  from  the  context  or  non-essential. 

As  an  example  we  give  derivations  of  the  two  Frege  formulas 

A-+(fl-»A)  (1.1) 

A-*(B-sC))-((A->B)-»(A-»C))  (1.2) 

A  derivation  of  1.1  is 

\xA(\yBxA ), 

and  a  derivation  of  1.2  is 

\uA-~{B~c)\vA~B\wA((uw)(vw)). 

Both  derivations  can  be  easily  written  in  the  more  usual  tree  form.  For 
obvious  reasons  we  will  also  use  the  word  term  for  derivations  and  type  for 
formulas.  The  possibility  to  treat  derivations  as  terms  and  formulas  as  types 
has  been  discovered  by  H.  B.  Curry  and  elaborated  by  W.  A.  Howard  in 
[How80a].  This  correspondence  can  easily  be  shown  to  be  an  isomorphism; 
it  is  called  the  Curry-Howard-isomorphism. 

Note  that  our  pure  implications)  logic  contains  full  classical  propositional 
logic,  as  follows.  Choose  a  particular  propositional  variable  and  denote  it 
by  X  (falsity).  Associate  with  any  formula  A  in  the  language  of  classical 
propositional  logic  a  finite  list  A  of  formulas  in  our  implicational  language, 
by  induction  on  A  3: 


P 
-A 
A~*  B 
A  A  B 
A  V  B 


P 

A  -»  X 

A —>  Bi,...,A-*  B„ 

A,  B 

{A-—*  -L),(Z?  — <  X)  — *  X 


Then,  if  A  is  a  formula  in  the  language  of  full  classical  propositional  logic  and 
Aj, . . . ,  Am,  is  its  associated  list,  A  is  derivable  in  classical  propositional  logic 
iff  each  A<  is  derivable  in  pure  implicational  logic  from  stability  assumptions 
-'-'P  — »  P  (with  ->B  denoting  B  — *  X)  for  all  propositional  variables  P  in 
A. 

3Ai, . . Am  — »  B  is  supposed  to  mean  ( A\  -•  (Aj  —  .  .(Am  — *  B)  ■ . .)) 
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1.2  Conversion 

We  are  interested  in  the  following  process  of  simplification  of  terms: 
(Xxx.r)ss  converts  into  (Ai.rr[s])s. 

Here  x  and  s  denote  finite  lists  X\  . . .  xm  and  S]  and  Axx.r  denotes 

Aii .  • .  AxmAxr.  Terms  of  the  form  (Xxx.r)ss  are  called  convertible  4.  We 
write 

r  — ♦  r' 

if  r'  is  obtained  from  r  as  follows.  Mark  some  occurences  of  convertible 
subterms  in  r.  Then  convert  them  all  simultaneously  5. 

More  precisely,  r  — *  r'  is  defined  by  the  following  rules 

1.  x  — >  x. 

2.  r  — *  r'  implies  Axr  — *  Axr', 

3.  r  — >  r'  and  s  — *  imply  rs  — >  rV, 

4.  r  —*  r'  and  s  — *  s'  and  s  — *  s'  imply  (Axx.r)ss  — +  (Ax.r^[s'])s'. 

As  a  special  case,  we  take 

rir' 

to  mean  that  r'  is  obtained  from  r  by  converting  exactly  one  convertible 
subterm  in  r.  Finally 


r  — »*  r  (r  reduces  to  r') 

‘'Note  that  converting  (Axz.r)ss  into  |Ax.r,.[s))s  may  be  viewed  as  first  converting 
(A xx.r)ss  “permutatively”  into  (AxlAxr)*)*  and  then  performing  the  inner  conversion  to 
obtain  (Ax.rr(«])r.  One  may  ask  why  we  take  this  conversion  relation  as  our  basis  and  not 
the  more  common  (Axr)s  t-»  r,(«i.  The  reason  is  that  our  notion  of  level  is  defined  with 
the  clause  level(  A  —  B)  =  max(< level  A)  + 1,  level  B)  and  not  =  max(!evel  A,  level  f?)+  1; 
this  in  turn  seems  reasonable  since  then  the  level  of  P| , . . .  Pm  —  Q  (i.  e.  of  (Pt  —  (Pj  — 
■  •  ■  (Pm  — *  0)  ■  ■  •))  )  i*  1  »nd  hence  independent  of  m.  But  given  this  definition  of  level, 

and  given  the  need  in  some  arguments  (e.  g.  in  Theorem  1.3.1)  to  perform  conversions  of 
highest  level  first,  we  must  he  able  to  convert  (Azy.r)sl  with  z  of  a  low  and  y  of  a  high 
level  into  (Ax. r, [(])».  — In  addition,  since  we  allow  more  conversions  here,  the  results  on 
strong  normalization  and  upper  bounds  for  the  length  of  arbitrary  reduction  sequences 
get  stronger. 

5Hence  new  convertible  subterms  generated  by  such  a  conversion  can  not  be  converted. 
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denotes  the  transitive  and  reflexive  closure  of  — *  (or  equivalently  of 

A  term  is  said  to  be  in  normal  form  if  it  does  not  contain  a  convertible 
subterm. 

We  want  to  show  now  that  any  term  reduces  to  a  normal  form.  This  can 
be  seen  easily  if  we  follow  a  certain  order  in  our  conversions.  To  define  this 
order  we  have  to  make  use  of  the  fact  that  all  our  terms  (i.e.  derivations) 
have  types  (i.e.  formulas). 

Define  the  level  of  a  formula  by 

level  P  =  0, 

Ievel(A  — *  B)  =  max((level  A)  +  1,  level  B). 

A  convertible  derivation 

(Ax *xA.r)ss 

is  also  called  a  cut  with  cut-formula  A.  By  the  level  of  a  cut  we  mean  the 
level  of  its  cut-formula.  The  cut-rank  of  a  derivation  r  is  the  least  number 
bigger  than  the  levels  of  all  cuts  in  r.  Now  let  t  be  a  derivation  of  cut-rank 
k  +  1.  Pick  a  cut 

(Axx.r)ss 

of  the  maximal  level  k  in  t,  such  that  s  does  not  contain  another  cut  of 
level  k.  (E.g.,  pick  the  rightmost  cut  of  level  k.)  Then  it  is  easy  to  see  that 
replacing  the  picked  occurrence  of  (Axx.r)as  in  t  by  ( Af  .rI[s])s  reduces  the 
number  of  cuts  of  the  maximal  level  k  in  t  by  1 .  Hence 

Theorem  1.2.1  We  have  an  algorithm  which  reduces  any  given  term  into  a 
normal  form.  □ 

We  now  want  to  give  an  estimate  of  the  number  of  conversion  steps  our 
algorithm  takes  until  it  reaches  the  normal  form.  The  key  observation  for 
this  estimate  is  the  obvious  fact  that  replacing  one  occurrence  of 

(Xxx.r)ss  by  (Ax.rr[s])s 

in  a  given  term  t  at  most  squares  the  length  of  t;  here  the  length  of  t  is 
t  aken  to  be  the  number  of  variables  in  t  (except  those  immediately  following 
a  A-symbol). 

A  bound  s*l  for  the  number  of  steps  our  algorithm  takes  to  reduce  the 
rank  of  a  given  term  of  length  I  by  k  can  be  derived  inductively,  as  follows. 
Let  sqI  :=  0.  To  obtain  »k+il,  first  note  that  by  induction  hypothesis  it  takes 


<  skl  steps  to  reduce  the  rank  by  k.  The  iength  of  the  resulting  term  is 

<  l2'  where  s  :=  skl  since  any  step  (i.e.  conversion)  at  most  squares  the 
length.  Now  to  reduce  the  rank  by  one  more  the  number  of  additional  steps 
is  obviously  bounded  by  that  length.  Hence  the  total  number  of  steps  to 
reduce  the  rank  by  k  +  1  is  bounded  by 

•Sjcf  +  f2  *  —•  Sls+l t- 

Theorem  1.2.2  (Upper  bound  for  the  complexity  of  normalization) 
The  normalization  algorithm  given  in  the  proof  of  Theorem  1.2.1  takes  at 
most  Ski  steps  to  reduce  a  given  term  of  cut-rank  k  and  length  l  to  normal 
form,  where 

Sol  :=  0  and  sk+il  :=  s*i  4-  P*‘\o 


1.3  Uniqueness 

We  shall  show  that  the  normal  form  of  a  term  is  uniquely  determined  6.  The 
main  idea  of  the  proof  (due  to  J.  B.  Rosser  and  W.  W.  Tait)  is  to  use  the 
relation  r  — *  r'  defined  in  section  1.2.  Its  crucial  property  is  given  by 

Lemma  1.3.1 

I  Ze  }=”.M 

The  proof  is  by  induction  on  the  definition  of  r  — *  r\  All  cases  are  obvious 
exept  possibly  rule  4.  So  assume  r  — *  r',  s  — »  s'  and  s  — *  s'.  Then 

ry[t\  -»  r'y[t'  1,  s„(<]  -*  s'y[<T  and  sv[t )  -*  sy[t'] 

by  induction  hypothesis,  and  hence 

(Aji.r„[tl)sv[t]sv[t]  —  (Ax.(ry[t'])r|sy(t/|])s'>(f'j 

by  definition  of  — ♦  .  D 

Lemma  1.3.2  Assume  r  —*  r'  and  r  —»  r".  Then  we  can  find  a  term  r"' 
such  that  r'  —*  r'"  and  r"  — *  r'". 

•The  argument  in  this  section  also  applies  to  type-free  terms,  i.e.  terms  without  formula 
superscripts. 
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The  proof  is  by  induction  on  the  definition  of  r  — >  r'.  Again  all  cases  are 
obvious  exept  possibly  the  situation  where  either  r  — *  r'  or  r  — »  r"  is  obtained 
via  rule  4.  By  symmetry  we  may  assume  the  former.  But  then  the  claim 
follows  from  Lemma  1.3.1:  If 


and 
then 
and 
and  if 

and 

then 


(Aioc. r)ss  — >  (AT.r^.[s'])s' 

(Aji.r)is  — t  (A xx.r")s"s", 
(Xxx.r)ss  —*  (Ax.r^.]s'])s'  — *  ( Xx.r'"\s'"])s'" 
(Xxx.r)ss  -  (A xx.r")s7's"  -  (Af.r"'[s"'])s»', 
(Xxxyy.r)sstt  — »  (Xxyy  .r'x[s'])s't't' 
(Xxxyy.r)sstt  — *  (Afxy.r"[<"])s"s"f" 


(Xxxyy.r)sstt  -*  ( Axyy.r(.[s'])s'<V  (A xy.r™v[s"',t'"])s7"t"' 

and 


(Xxxyy.r)sstt  — » (Xxxy.r'^\t"])s"s"i''  — »  (A xy.r'"l/[s"',t"'})s"'tr" 

Theorem  1.3.1  (Church-Rosser)  Assume  r  — ♦*  r'  and  r  — ►*  r".  Then  we 
can  find  a  term  r"'  such  that  r'  — »*  r'"  and  r"  — »'  r'" . 

The  proof  is  immediate  from  Lemma  1.3.2.  O 

Corollary  1.3.2  Assume  r  — *'  r'  and  r  — »*  r",  where  both  r'  and  r"  are  in 
normal  form.  Then  r'  and  r"  are  identical.  Q 
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1.4  A  lower  bound  for  the  complexity  of  nor¬ 
malization 

In  Theorem  1.2.2  we  have  obtained  an  upper  bound  on  the  number  of  conver¬ 
sion  steps  our  particular  normalization  algorithm  of  Theorem  1.2.1  takes  to 
reach  the  normal  form.  This  upper  bound  was  superexponential  in  the  length 
of  the  given  term.  It  is  tempting  to  think  that  by  choosing  a  clever  norma¬ 
lization  strategy  one  might  be  able  to  reduce  that  bound  significantly.  It  is 
the  purpose  of  the  present  section  to  show  that  this  is  impossible.  More  preci¬ 
sely,  we  will  construct  terms  en  of  length3n  and  show  that  any  normalization 
algorithm  needs  at  least  2n_j  —  n  conversions  (with  2o  :=  1.2n+i  :=  22")  to 
reduce  r„  to  its  normal  form. 

The  fact  that  there  is  no  elementary  algorithm  (i.  e.  whose  time  is  expo¬ 
nentially  bounded)  to  compute  the  normal  form  of  terms  also  follows  from 
Statman  [Sta79|.  where  it  is  shown  more  generally  that  the  problem  whether 
two  terms  r |  and  r2  have  the  same  normal  form  is  not  elementary  recursive. 
The  simple  example  treated  here  is  taken  from  [Sch82,  p.  455] 

The  pure  types  k  are  defined  inductively  by  0  :=  P  (some  fixed  propo¬ 
sitional  variable)  and  k  +  1  =  k  — *  k.  We  define  iteration  terms  /„  of  pure 
type  k  2  by 

/B:=A/Ax/(/(... /(/*)...)), 

with  n  occurrences  of  /  after  A/\x;  here  /.  x  are  variables  of  type  k  +  1.  k. 
respectively.  Let  /  o  j  be  an  aboreviation  for  A xf(gx),  and  let  r  =  s  mean 
that  r  and  s  have  the  same  normal  form.  With  this  notation  we  can  write 

/n  =  xf  /  °  f  °  • ' '  °  L 

n 

The  main  point  of  our  argument  is  the  following  simple  lemma,  which  can 
be  traced  back  to  Rosser  (cf.  [Chu41,  p.  30]) 

Lemma  1.4.1  /.  (Imf)  o  (/„/)  =  /m+n/ 

3-  fm  0  L>  =  fm-.n 

3-  ImIn  =  Aim  ^ 

As  an  immediate  consequence  we  have 


Now  consider  any  sequence  of  reduction  steps  transforming  rn  into  its  normal 
form,  and  let  s„  denote  the  total  number  of  reduction  steps  in  this  sequence. 

Theorem  1.4.1  s„  >  2n_2  —  n 

Proof:  The  length  of  rn  is  3 n.  Note  that  any  conversion  step  can  at  most 
square  the  length  of  the  original  term.  Hence  we  have 

2n  <  length  of  /2n  (the  normal  form  of  rn) 

<  (length  of  r„)2"" 

=  (3n)3*" 

<  22''+,"  (since  3n  <  22"), 
and  the  theorem  is  proved.  □ 

1.5  Strong  normalization 

In  Section  1.2  we  have  proved  that  any  term  can  be  reduced  to  a  normal 
form,  and  in  Section  1.3  we  have  seen  that  this  normal  form  is  uniquely 
determined.  But  it  is  still  conceivable  that  there  might  be  an  odd  reduction 
sequence  which  does  not  terminate  at  all.  It  is  the  aim  of  the  present  section 
to  show  that  this  is  impossible.  This  fact  is  called  the  strong  normalization 
theorem. 

For  the  proof  we  employ  a  powerful  method  due  to  W.W.  Tait,  which 
is  based  on  so-called  strong  computability  predicates.  These  are  defined  by 
induction  on  the  types  (i.  e.  formulas)  as  follows. 

A  term  rA  with  A  of  level  0  (i.  e.  a  propositional  variable)  is  strongly 
computable  iff  r  is  strongly  normalizable,  i.  e.  every  reduction  sequence 
starting  from  r  terminates.  A  term  rA~"B  is  strongly  computable  iff  for  all 
strongly  computable  s'4  also  (rs)fl  is  strongly  computable. 

A  term  r  is  strongly  computable  under  substitution  iff  for  all  strongly 
computable  s  the  result  of  substituting  s  for  all  variables  free  in  r  is  again 
strongly  computable. 

Lemma  1,5.1  Let  A  be  a  formula. 

1.  ny  strongly  computable  term  rA  is  strongly  normalizable. 

2.  xA  is  strongly  computable. 
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We  prove  1  and  2  simultaneously  by  induction  on  A.  For  A  of  level  0  both 
claims  are  obvious.  Now  consider  A  — +  B.  For  1,  assume  that  rA~~B  is 
strongly  computable.  By  induction  hypothesis  2  and  the  definition  of  strong 
computability  we  know  that  (rx)e  is  strongly  computable  and  hence  that 
any  reduction  sequence  starting  with  rx  terminates  (by  induction  hypothesis 
l).  But  this  obviously  implies  that  the  same  is  true  for  r.  For  2,  assume  that 
r  are  strongly  computable.  We  have  to  show  that  xr  (which  is  to  be  of  level 
0)  is  strongly  computable,  i.  e.  that  any  reduction  sequence  starting  with  xr 
terminates.  But  this  follows  from  induction  hypothesis  1,  which  says  that 
any  reduction  sequence  starting  from  r,  terminates.  □ 

Lemma  1.5.2  If  r  r'  and  r  is  strongly  computable,  then  r'  is  strongly 
computable. 

Proof:  Let  s  be  strongly  computable.  We  have  to  show  that  r's  is  strongly 
computable,  i.  e.  that  any  reduction  sequence  starting  from  r's  terminates. 
But  this  is  obviously  true,  because  otherwise  we  would  also  have  an  infinite 
reduction  sequence  for  rs.  O 


Lemma  1.5.3  Any  term  r  is  strongly  computable  under  substitution. 


The  proof  is  by  induction  on  the  height  of  r.  [FJ  obvious.  |  rs  [  Let  t  be 
strongly  computable.  We  have  to  show  that  r(t]s[<]  is  strongly  computable. 
But  this  holds,  since  by  induction  hypothesis  we  know  that  r[t]  as  well  as 
s[t]  are  strongly  computable.  [  Axx.r  Let  t  be  strongly  computable.  We  have 


to  show  that  Axx.r{<]  is  strongly  computable.  So  let  s,  s  and  r  be  strongly 
computable.  We  must  show  that  (Axx.r[t])Jsr  is  strongly  computable,  i.  e. 
than  any  reduction  sequence  for  it  terminates.  So  assume  we  have  an  infinite 


reduction  sequence.  Since  r[t],  3,  s  and  f  all  are  strongly  normalizable,  there 
must  be  a  term  (Axx.r[i]')s'sV  with  r(<]  — **  r[f]',  s'  — **  s,  s  — »*  s'  and  r  -+*  r' 


in  that  reduction  sequence  where  a  “head  conversion”  is  applied,  which  we 
may  assume  to  yield 

(Ax.(r[t]')M)sV. 

But  r(t]  — ►*  r[t]'  implies  Ax.r(s,  t]  — **  Ax.(r[t]')[s'],  and  hence  the  fact  that 
A x.r  is  (by  induction  hypothesis)  strongly  computable  under  substitution  to¬ 
gether  with  Lemma  1.5.2  implies  that  (Ax.(r(i]')[s']  is  strongly  computable. 
But  then,  again  by  Lemma  1.5.2,  also  (Ax.(r(<])'[s'])i*'r'  is  strongly  compu¬ 
table  and  therefore  strongly  normalizable.  This  contradicts  our  assumption 
above  that  we  have  an  infinite  reduction  sequence,  □ 

From  Lemma  1.5.3  and  both  parts  of  Lemma  1.5.1  we  can  conclude  im¬ 
mediately 
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Theorem  1.5.1  Any  term  r  is  strongly  normalizable.  □ 


1.6  An  upper  bound  for  the  length  of  ar¬ 
bitrary  reduction  sequences 

By  section  1.5  we  already  know  that  the  full  reduction  tree  for  a  given  term 
is  finite;  hence  its  height  bounds  the  length  of  any  reduction  sequence.  How¬ 
ever,  it  is  not  obvious  how  a  reasonable  estimate  for  that  height  might  be 
obtained. 

Here  we  note  that  a  much  simpler  tree  (to  be  called  below  0-tree)  which 
is  essentially  the  head  reduction  sequence  (and  branches  only  in  the  case 
xt i . . .  t„  with  successors  tiyt,. . . ,  t„yn )  has  the  property  that  the  number  of 
its  nodes  with  conversions  bounds  the  length  of  any  reduction  sequence7.  The 
height  of  that  tree,  and  hence  also  the  number  of  its  nodes,  can  be  estimated 
using  a  technique  due  to  Howard  [How80b],  which  in  turn  is  based  on  work 
of  Sanchis  [San67]  and  Diller  [Dil68].  This  gives  the  desired  upper  bound. 

The  method  of  Gandy  [Gan80b]  can  also  be  used  to  obtain  a  bound  for 
the  length  of  arbitrary  reduction  sequences;  this  is  carried  out  in  [Sch82]. 
However,  the  bound  derived  here,  apart  from  being  more  intelligible,  is  also 
better. 

For  any  term  r  of  level  0  a  head  reduction  tree  of  degree  fc  (to  be  called 
fc-tree)  for  r  is  generated  by  the  following  rules.  All  terms  in  that  tree  are 
supposed  to  have  level  0. 

1.  (Xxr)st  has  successor  r*[a]l. 

2.  itr . . .  tn  has  successors  t\y (, . . . ,  tnyn.  In  particular,  x  of  level  0  has  no 
successor. 

3.  rti ...  tn  with  r  level  <  fc  and  n  >  0  has  successor  ry,  tjj/i, . . . ,  tnyn. 

Note  that  for  (Azr)s<  with  z  of  level  >  k  only  rule  1  (and  not  rule  3)  can  be 
applied  when  constructing  a  fc-tree. 

Hence  the  relation  (r(k  <  a  for  terms  r  of  level  0  (to  be  read:  r  has  a 
fc-tree  of  height  <  a)  can  be  defined  inductively,  as  follows: 

7This  is  not  quite  true,  but  only  for  so-tailed  A-l-terms,  where  any  variable  bound  by 
A  actually  occurs  in  the  kernel.  But  the  general  case  can  be  easily  reduced  to  this  one  by 
introducing  dummy  variables;  this  is  carried  out  below. 
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1.  If  |rx[.s]t]fc  <  a0  <  a,  then  |(Air)st|t  <  a. 

2.  If  Itij/iU  <  <  a  for  i  =  then  l-ril*  <  a.  In  particular, 

|x|fc  <  a  for  any  a,  for  x  of  level  0. 

3.  If  |ryj*  <  a0  <  a  and  <  a,-  <  a  for  i  =  1, . . . ,  n,  and  if  r  has  level 

<  fc,  then  (rfjfc  <  a. 

Lemma  1.6.1  tf\r[k  <  a  and  (syyj|k  <  b  {or  j  =  2, . . . ,  m,  and 
have  levels  <  k,  then  |r?(s]|k  <  b+  a. 

The  proof  is  by  induction  on  the  generation  of  |r|t  <  a.  We  write  2*  for  (?[«!• 

1.  |r*(s*]<'|k  <6  +  ao<A  +  aby  induction  hypothesis,  and  hence  we  have 
|(Air')s*<*|jt  <  6  4-  a. 

2.  (fffiU  <  b  +  a,  <  b  +  a  for  i  =  1 . m  by  induction  hypothesis,  and 

hence  lit'U  <  b  +  a.  If  x  is  one  of  the  xj  to  be  substituted  by  Sj,  we 
have  to  use  rule  3  instead.  This  can  be  done  since  s}  has  level  <  k  and 
by  aLssumption  <  b.  Now  in  case  a  is  not  zero  we  have  b  <  b  +  a 

and  hence  <  b  +  a  by  rule  3.  In  case  a  is  zero  there  are  no  t,‘s 

and  we  have  used  rule  2  to  generate  jxj*  <  0.  Then  |Sj|*  <6  +  0  holds 
by  assumption. 

3-  Ir'f|fc  <6  +  ao<6  +  a  and  <  6  +  a;  <  6  +  a  hold  by  induction 

hypothesis,  and  hence  |r*<*|  <  6  +  a.  O 

Lemma  1.6.2  //Mit-H  <  a,  then  |rj*  <  2“. 

The  proof  is  by  induction  on  the  generation  of  |r|fc+i  <  a.  The  only  case  which 
requires  some  attention  is  when  |r<]t+i  <  a  was  generated  from  |ry)*+i  < 
a0  <  a  and  |<jyi|k+i  <  a,-  <  a  for  t  =  l,...,n  by  rule  3,  and  r  has  level 
k  +  1.  By  induction  hypothesis  we  then  have  |ry|k  <  2a°  and  <  2ttl 

for  i  =  1,. . .,n.  Now  by  the  previous  Lemma  we  can  conclude  that  |r<]fc  < 

2mu(ai . an)  2°«  <  2a.  O 

It  now  follows  that  from  a  bound  for  the  height  of  some  fc-tree  for  r  (and 
for  sufficiently  big  It  such  a  bound  can  be  obtained  easily;  see  Lemma  1.6.5 
below)  we  can  derive  a  bound  for  the  height  of  the  0-tree  of  r.  Note  that 
this  0-tree  is  uniquely  determined,  since  only  rules  1  and  2  can  be  applied; 
we  shall  call  it  the  head  reduction  tree  for  r  and  write  |r|0  for  its  height,  i.  e. 
for  the  least  a  such  that  |rj0  <  a. 
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Our  key  observation  is  that,  under  a  slight  additional  hypothesis,  the 
number  #r  of  conversions  in  the  head  reduction  tree  of  r  bounds  the  length 
of  any  reduction  sequence.  More  precisely,  #r  is  defined  for  any  term  r  of 
level  0  by  induction  on  the  generation  of  |r|o  <  a  by  the  rules  above,  i.  e. 

1.  #(\xr)st:=  (#rr[s]t)  +  1, 

2.  #*<!...<„:=  £#*.£. 

i=l 

Note  that  from  1.  we  can  conclude 

#(Aix.r)sst  =  (#(  Ax  x.rr[s])st  +  1; 
this  will  be  used  below. 

A  term  is  called  a  A-I-term  if  in  its  generation  a  subterm  Air  is  formed 
only  in  case  x  €  FVr. 

Lemma  1.6.3  Let  r  be  a  \-l-term  of  level  0.  Then  r  r'  implies  that 
#r  >  #r'. 

Proof.  We  show  more  generally  that  for  r  a  A-I-term  with  z  6  FVr  we  have 
#rt[(\xx.p)qq}  >  ^((Ax.pxfa])?] 

For  brevity  we  write  t‘  for  it[{Xxx.p)qq)  and  t'  for  t,((Ax.pr[7))fl].  The  proof 
is  by  induction  on  #r*. 

#((Axr  )at?  =  (#(rt[a]t)*)  +  1 

>  (#(M*]ty)  +  l 

=  #((Axr)s<)'. 

where  the  >  follows  by  the  induction  hypothesis.  Note  that  for  the  applica¬ 
tion  of  the  induction  hypothesis  here  we  have  used  x  6  FVr ,  which  follows 
from  our  assumption  that  we  are  dealing  with  A-I-terms. 

#(y<V  = 

t 

> 

t 

-  #(v0' 

=  #(A  Xx.p)qqV 

38  (#(-'*-Pr(9])^*)  +  1 

>  (#(Af.p,[?l)9f )  +  1 

>  #(A*.pr[ql)q*' 
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Furthermore,  it  is  easy  to  estimate  #r  in  terms  of  |r|0: 

Lemma  1.6.4  Let  m  be  the  maximal  number  of  premisses  in  formulas  taken 
as  free  assumptions  in  a  derivation  r  (of  a  formula  of  level  Oj.  Then 

#r  <  m^° . 

The  proof  is  by  induction  on  the  generation  of  |r[0  <  a. 

#(A  xr)st  =  (#rr[s]f)  +  1 

<  m|r*wi,°  +  1 

<  ml(Air)jilo 

SUi 

1  =  1 

i=t 

<  n  ■ 

<  since  n  <  m.  □ 

We  now  give  an  estimate  for  the  height  of  some  k- tree  for  r,  for  a  sufficiently 
big  k.  Here  we  need  the  notion  of  the  height  of  a  term  r,  which  is  defined  by 

1.  height  i  =  0 

2.  height  Air  =  (  height  r)  +  1 

3.  height  ts  =  max((height  t)  +  1,  (height  s)  +  1) 

Lemma  1.6.5  For  any  variable  x  we  have,  with  an  arbitrary  k, 

|xy|fc  <  level  x.  (1.3) 

For  any  term  r,  if  all  subterms  of  r  have  levels  <  k,  then 

I ry\k  <  k  +  (height  r).  (1.4) 

Proof:  1.3  can  be  seen  easily  by  induction  on  the  level  of  i: 

|yi£ U  <  level  y;  <  level  i 
for  t  =  1, . . . ,  n  by  induction  hypothesis,  and  hence 

|*yl*  <  level  x. 

1.4  is  proved  by  induction  on  the  height  of  r. 


(i)  If  r  is  a  variable,  the  claim  follows  from  1.3 

(ii)  (A xr)yy.  By  induction  hypothesis, 

My]y|*  <  k  +  (height  r) 

<  k  +  (height  Air). 

Hence,  by  rule  1, 

j(Axr)yyj).  <  k  +  (height  Air). 

(iii)  rty.  By  induction  hypothesis, 

\ryy\k  <  k  +  height  r  <  k  +  height  rt 
<  k  +  height  t  <  k  +  height  rt 

and  by  1.3 

|y<s<U  ^  'evel  yi  <  k- 

Hence,  by  rule  3, 


MvW  ^  k  +  height  rt.  □ 

In  Lemma  1.6.3  we  needed  the  hypothesis  that  r  is  a  A-I-term  in  order  to 
conclude  >  ^jtr'  from  r  r'.  This  hypothesis  is  certainly  necessary,  since 
in  non-A-I-terms  subterms  can  disappear  by  means  of  conversions,  and  hence 
the  head  reduction  tree  may  not  show  any  trace  of  a  conversion  inside  the 
term.  An  example  is  (Aiy)((Aip)g)  and  ( Axy)(pr[</]),  both  of  which  have  the 
same  head  reduction  tree  (consisting  of  one  additional  node  labeled  y). 

However,  we  can  easily  reduce  the  general  case  to  the  case  of  A-I-terms. 
To  achieve  this  we  just  introduce  dummy  variables  which  turn  the  given 
term  r  into  a  A-I-term  r*  (a  variant  of  r,  as  we  shall  say),  and  note  that 
the  length  of  any  reduction  sequence  for  r  is  bounded  by  the  length  of  a 
reduction  sequence  for  r*. 

By  an  immediate  variant  of  a  term  r  of  type  A  —*  P  we  mean  a  term 
r'  =  Ay.ut(ry), 

where  t  is  any  term  of  some  type  B  and  u  is  a  new  variable  of  type  B.  F  — »  P: 
the  variables  y  are  supposed  to  have  types  A.  Note  that  r'  has  the  same  type 
A  — ►  P  as  r.  Call  a  term  r<m*  an  m-fold  immediate  variant  of  r  if  there 
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are  terms  r*0',  r*1', . . .  such  that  r(°>  =  r  and  r(,+1>  is  an  immediate 

variant  of  r'1*.  Finally  a  term  r*  is  called  a  variant  of  r  if  it  is  obtained  from 
r  by  taking  possibly  multiple  immediate  variants  of  all  of  its  subterms.  More 
precisely,  r*  is  a  variant  of  r  if  variant  rr*  can  be  derived  by  the  rules 

1.  variant  xx(m* 

2.  If  variant  rr”,  then  variant  (Axr)(Axr*)*m'. 

3.  If  variant  tt’  and  variant  ss",  then  variant  (<s)(<*s*)'m*. 

It  is  obvious  that  for  any  term  we  can  find  a  variant  which  is  a  A-I-term: 
just  replace  any  subterm  Air  with  x  £  FVr  by  its  immediate  variant 

A  xy.ux(ry). 

As  noted  above,  it  suffices  to  prove 

Lemma  1.6.6  //  r  -U  ri  and  r*  is  a  variant  of  r,  then  we  can  find  a  variant 
rj  of  C]  such  that  r"  — *+  r where  — >+  is  defined  just  as  — ►*  exept  that 
refiexivity  is  not  allowed. 

For  the  proof  we  need  two  observations. 

r's  converts  into  (rs)'  (1.5) 

(Af.(Axr)<m>)5s  -*  (Ax.((Axr)s)(m>)J’  (1.6) 

1.5  follows  from  the  fact  that 


(A yy.ut(ryy))s  converts  into  Xy.ut(rsy). 


1.6  is  proved  by  induction  on  m.  The  case  m  =  0  is  immediate.  Suppose 
m  >  0.  Writing  p  for  (Axr)'m_1',  we  have 


(Ax.(Axr)*m*ss 


(Ax.p')ss 

(\x.\yy.ut(pyy))ss 

(Xx.\y.ut(psy))s 

(Ax.(ps)V 

(Ax.((Axr)!m-‘>s)')i' 

(Ax.((Axr)s)<m>)s, 
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where  in  the  last  step  we  have  used  1.5. 

For  the  proof  of  the  Lemma  we  restrict  ourselves  to  the  case 

(Aiix.r)sis  4  (Aiirr(s])«i; 

the  other  cases  are  similar  or  immediate  by  the  induction  hypothesis.  An 
arbitrary  variant  of  (Aii  x.r^s  must  have  the  form 

(((A*1(Axr*)<ra))<m>>s;)<B'V)‘n) 

Because  of  1.3  we  may  assume  that  this  variant  has  the  more  special  form 

(Ax,(Axr’)(m>)sJs*. 

Because  of  1.4  we  know  that  this  term  reduces  to 
(Ax,((Axr-)s*)«m»)s;. 

Now  this  term  clearly  reduces  to 

(Ax,(r;[s*])<m>K, 

which  is  a  variant  of  (Axir,[s])«i.  □ 

To  summarize,  we  have  the  following  result. 

Theorem  1.6.1  Let  r  be  a  derivation  of  a  formula  of  level  0,  e.  of  a 
propositional  variable.  Let  r*  be  a  \-l-variant  of  r.  Let  k  and  m  be  such 
that  all  subterms  of  r  have  levels  <  k,  and  that  all  assumptions  free  in  r  have 
<  m  premisses.  Then  the  length  of  an  arbitrary  reduction  sequence  for  r  with 
respect  to  — *  is  bounded  by 


mJ*(fc+(height  r*) 
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Chapter  2 

Normalization  for  first-order 
logic 


We  restrict  our  attention  to  the  — *V-fragment  of  first-order  logic  with  just 
introduction  and  elimination  rules  for  both  symbols,  i.  e.  with  minimal  logic 
formulated  in  natural  deduction  style.  This  restriction  does  not  mean  a 
loss  in  generality,  since  it  is  well  known  that  full  classical  first-order  can  be 
embedded  in  this  system;  the  argument  for  that  fact  is  sketched  in  section  2.1. 
Equality  is  not  treated  as  a  logical  symbol,  but  can  be  added  via  suitable 
equality  axioms. 

We  extend  our  results  and  estimates  on  normalization  to  first-order  logic 
by  the  method  of  collpasing  types.  Applications  include  the  subformula 
property,  Herbrand’s  theorem  and  the  interpolation  theorem. 


2.1  The  — ♦V-fragment  of  first-order  logic  as 
a  typed  A-calculus 

Assume  that  a  fixed  (at  most  countable)  supply  of  function  variables 
and  predicate  variables  P,Q,. ..  is  given,  each  with  an  arity  >  0.  Terms  are 
built  up  from  object  variables  x,  y ,  z  by  means  of  frt  . . .  rm .  Formulas  are 
built  up  from  prime  formulas  Pri  ...rm  by  means  of  (A  —*  B)  and  VxA. 
Derivations  are  built  up  from  assumption  variables  xA,yA  by  means  of  the 
rule  — of  implication  introduction 

( XxArB)A~B , 
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the  rule  — of  implication  elimination 

(tA~BsA)B, 

the  rule  V+  of  V-introduction 


(AxrA)VzA, 

provided  that  no  assumption  variable  yB  free  in  rA  has  x  free  in  its  type  B, 
and  finally  the  rule  V~  of  V^elimination 

Each  of  the  rules  — >+,V+  and  V~  has  a  uniquely  determined  derivation  as 
their  premiss,  whereas  has  the  two  derivations  tA~'B  and  sA  as  premisses. 
Here  tA~"B  is  called  the  main  premiss  and  s'*  is  called  the  side  premiss. 

As  an  example  we  give  a  derivation  of 

Vx(Px  — *  Qx)  — >  (VxPx  — »  VxQx). 


Such  a  derivation  is 


\v*zPz \x((ux)(vx)) . 

Derivation  can  be  easily  written  in  the  more  usual  tree  form.  We  will  continue 
to  use  the  word  term  for  derivations  (as  long  as  this  does  not  lead  to  confusion 
with  the  notion  of  (object)  term  inherent  in  first-order  logic),  and  type  for 
formula. 

Note  that  our  (— ►  V-fragment  of)  minimal  logic  contains  full  classical 
first-order  logic.  This  can  be  seen  as  follows: 

1.  Choose  a  particular  propositional  variable  and  denote  it  by  X  (falsity). 
Associate  with  any  formula  A  in  the  language  of  classical  first-order  logic  a 
finite  list  A  of  formulas  in  our  — *  V-fragment,  by  induction  on  A: 


Pr 

b— b 

PP 

-■A 

b-i 

A  -*  X 

A-*B 

A-*  A  Bn 

AAB 

A,  B 

AM  B 

(A->±),(B~>  1)-* 

MxA 

>— b 

VxAj, . . . ,  VxAm 

3xA 

>— ► 

Vx(A  -*  X)  — »  X 
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2.  In  any  model  M  where  _L  is  interpreted  by  falsity,  we  clearly  have  that 
a  formula  A  in  the  language  of  full  first-order  logic  holds  under  an  assignment 
a  iff  all  formulas  in  the  assigned  sequence  A  hold  under  a. 

3.  Our  derivation  calculus  for  the  — >V-fragment  is  complete  in  the  follo¬ 
wing  sense:  A  formula  A  is  derivable  from  stability  assumptions 

Vxf-.-Px  -  Px) 

for  all  predicate  symbols  P  in  A  iff  A  is  valid  in  any  model  under  any  assign¬ 
ment. 

2.2  Strong  normalization 

Here  we  use  the  method  of  collapsing  types  (cf.  [TvD88,  p.560])  to  transfer 
our  results  and  estimates  concerning  strong  normalization  from  implicational 
logic  to  first-order  logic. 

The  notions  concerning  conversion  introduced  in  section  1.2  ran  be  easily 
extended  to  first-order  logic.  In  particular,  we  have 

(Xxx.r)ss  converts  into  (Ai.rr[sj)s, 

where  the  variables  x,  x  now  can  be  either  assumption  variables  or  else  object 
variables.  The  rules  generating  the  relation  r  — >  r'  are  extended  by  requiring 
r  — *  r  for  object  terms  r  of  our  first-order  logic.  Again  a  derivation  is  said 
to  be  in  normal  form  if  it  does  not  contain  a  convertible  subderivation. 

For  any  formula  A  of  first-order  logic  we  define  its  collapse  Ac  by 

{Pf)c  =  P 
(A  -*  B)c  s  A‘-Bc 
(VxA)c  =  T  -  Ac 

where  T  :=  X  -+  X  with  X  a  fixed  propositional  variable  (i.  e.  T  means 
truth).  The  level  of  a  formula  A  of  first-order  logic  is  defined  to  be  the  level 
of  its  collapse  Ac.  For  any  derivation  rB  in  first-order  logic  we  can  now  define 
its  collapse  (rB)c.  It  is  plain  from  this  definition  that  for  any  derivation  rB 
in  first-order  logic  with  free  assumption  variables  xf' , . . . ,  xAm  the  collapse 
(rB)c  is  a  derivation  (rc)B'  in  implicational  logic  with  free  assumption  varia¬ 
bles  xf,,...,x#>. 

( xA)c  =  xA‘ 
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(A  xAr)c  =  XxA‘rc 
(tA~Bs)c  s  tcsc 
(A  xr)c  s  A  a:V 
(tVlAs)c  s  lc(\ ^xzi)T 

Note  that  for  any  derivation  rB,  assumption  variable  xA  and  derivation  sA  we 
have  that  r'fa'j  is  a  derivation  in  implicationai  logic  (where  the  substitution 
of  sc  is  done  for  the  assumption  variable  i'4'),  which  is  the  collapse  of  r[s]. 
Also  for  any  derivation  rB ,  object  variable  i  and  term  s  we  have  that  rI[sj 
is  a  derivation  of  with  collapse  (rr(s])c  =  rc. 

Lemma  2,2.1  If  r  A  r'  in  first-order  logic,  then  rc  4  (r'j^  in  implicationai 
logic. 

The  proof  is  by  induction  on  the  generation  of  r  4  r' .  We  only  treat  the 
case 

(Air)s  4  rr(sj. 

If  x  is  an  assumption  variable,  then 

((A*Ar*V)c  =  (A**V)*e 

4  r'(s'\ 

=  (rW)e), 

by  the  note  above.  If  x  is  an  object  variable,  then 

((XxA)s)c  =  (AzV)(Az1j1)t 

4  rc 

5  (r(a])‘), 

again  by  the  note  above.  □ 

Hence  from  Theorem  1.5.1  we  can  conclude 

Theorem  2.2.1  Any  derivation  r  in  first-order  logic  is  strongly  normaliza¬ 
ble.  Q 

Also  we  can  apply  Theorem  1.6.1  to  obtain  an  upper  bound  for  the  length 
of  arbitrary  reduction  sequences. 
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Theorem  2.2.2  Let  r  be  a  derivation  in  first-order  logic  of  a  formula  of 
level  0,  i.  e.  a  prime  formula.  Let  rc  be  the  collapse  of  r  into  implicational 
logic ,  and  rc"  be  a  X-I-variant  of  rc  (cf.  the  remark  before  Lemma  1.6.6). 
Let  k  and  m  be  such  that  all  subderivations  of  rc *  have  levels  <  k.  and  all 
assumption  variables  free  in  rcm  have  <  jn  premisses.  Then  the  length  of  an 
arbitrary  reduction  sequence  for  r  with  respect  to  — *  is  bounded  by 

m2i,(t+height  r")  q 


2.3  Uniqueness 

The  Church-Rosser  Theorem  and  hence  the  uniqueness  of  the  normal  form 
for  derivations  in  first-order  logic  can  be  proved  exactly  as  in  section  1.3. 
We  do  not  repeat  this  here. 


2.4  Applications 

Here  we  want  to  draw  some  conclusions  from  the  fact  that  any  derivation 
in  first-order  logic  can  be  transformed  into  normal  form.  The  arguments  in 
this  section  are  based  on  Prawitz’  book  [Pra65].  We  begin  with  an  analysis 
of  the  form  of  normal  derivations. 

Let  a  derivation  r  be  given.  A  sequence  rt, . . . ,  rm  of  subderivations  of  r 
is  a  path  if 

1.  rq  is  an  assumption  variable, 

2.  r,  is  the  main  premiss  of  r,+],  and 

3.  rm  is  either  the  whole  derivation  r  or  else  the  side  premiss  of  an  instance 
of  the  rule  — within  r. 

It  is  obvious  that  any  subderivation  of  r  belongs  to  exactly  one  path.  The 
order  of  the  path  ending  with  the  whole  derivation  r  is  defined  to  be  0.  and 
if  the  order  of  the  path  through  the  main  premiss  t  of  some  instance  tA~~BsA 
of  the  rule  — in  r  is  it,  then  the  order  of  the  path  ending  with  that  sA  is 
defined  to  be  It  +  I  • 

The  relation  “A  is  a  subformula  of  Bn  is  defined  to  be  the  transitive  and 
reflexive  closure  of  the  relation  “immediate  subformula”,  defined  by 

1.  A  and  B  are  immediate  subformulas  of  A  — *  6, 
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2.  Ar[r]  is  an  immediate  suhformula  of  Var/4. 


I 

4 


We  will  also  need  the  notion  “A  is  a  strictly  positive  subformula  of  B" .  which 
is  defined  to  be  the  transitive  and  reflexive  closure  of  the  relation  “immediate 
strictly  positive  subformula'’,  defined  by 

1.  B  is  an  immediate  strictly  positive  subformula  of  A  — *  B, 

2.  Arjr]  is  an  immediate  strictly  positive  subformula  of  VxA. 

In  a  normal  derivation  r  any  path  rf rAm  has  a  rather  perspicious 
form:  all  elimination  rules  must  come  before  all  introduction  rules.  Hence,  if 
i  is  maximal  such  that  rA‘  ends  with  an  elimination  rule,  then  Ai  must  be  a 
strictly  positive  subformuiaof  alt  for  j  i.  This  A,  is  called  the  minimal 
formula  of  the  path.  Also,  any  A}  with  j  <  i  is  a  strictly  positive  subformula 
of  Ai,  and  any  Aj  with  j  >  i  is  a  strictly  positive  subformula  of  Am. 

Theorem  2.4.1  (Subformula  property)  lfrA  is  a  normal  derivation  with 
free  assumption  variables  among  if' , . . . ,  and  sB  is  a  subdertvation  of 
%  rA ,  then  B  is  a  subformula  of  A  or  of  some  A, . 

The  proof  is  by  induction  on  the  order  of  paths  in  r,  using  the  property  of 
paths  in  norma!  derivations  mentioned  above.  O 

We  write  At, . . . ,  Am  1-  A  to  mean  that  there  is  a  derivation  rA  with  free 
assumption  variables  among  xf 1 , . . . ,  . 

Theorem  2.4.2  (Herbrand)  Assume  Vf)  A,, . . .  ,VxmAm  h  B  with  quan¬ 
tifier-fret  Aj, - A„,B.  Then  we  can  find  r„ . riBI, - rm 

such  that 

Ai(r)]J, ....  Aj(rln)], ....  Am[rmj], . . . ,  Am[rmnm]  t*  B 

Proof:  To  simplify  notation  let  us  assume  VjrA  h  B  with  quantifier-free  A.  B. 
By  section  2.2  we  can  construct  from  the  given  derivation  a  normal  derivation 
rB  with  free  assumption  variables  among  x*xA .  By  induction  on  the  order 
of  paths  it  is  easy  to  see  that  any  path  must  end  with  the  derivation  of  a 
quantifier-free  formula  and  must  begin  with  the  rule  V~,  i.  e.  with  x*TAri, 
Now  replace  any  such  subderivation  by  yl*  >  with  new  assumption  variables 
Vi-  a 

Our  next  application  is  the  Craig  interpolation  theorem.  We  shall  use  the 
notation  Ai , . . . ,  Am  K  A  fc  for  classical)  to  mean  that  there  is  a  derivation 
rA  with  free  assumption  variables  among  xA‘ , . . . ,  xAm  and  some  stability 
assumptions  for  P  predicate  variable  in  A.  A,  where  again  ->8 

denotes  B  — *  _L  with  a  fixed  propositional  variable  A. 
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Theorem  2.4.3  (Interpolation)  Assume  T,  A  b  A.  Then  we  can  find  a 
finite  list  F  of  formulas  such  that 

r  K  F  and  F,A\-C  A 

!  where  T  M  F  means  T  hc  F,  for  each  Fj  in  F),  and  any  object  or  predicate 
rariable  free  in  F  occurs  free  both  in  F  and  in  A,  A. 

For  the  proof  we  shall  use  a  somewhat  more  explicit  formulation  of  the  theo¬ 
rem:  Let  rA  be  a  derivation  with  free  assumption  variables  among  if ,  F4 . 

gi 

Then  we  can  find  a  finite  list  r,  of  derivations  with  free  assumption  va¬ 
riables  among  if  and  stability  assumptions  and  a  derivation  rA  with  free 
assumption  variables  among  and  stability  assumptions,  such  that  any 

object  or  predicate  variable  free  in  F  occurs  free  both  in  T  and  in  A./ A. 

For  brevity  we  shall  not  mention  stability  assumptions  any  more  (they 
will  only  he  used  in  case  26(ji)  below),  and  write  ~rA  with  i7r",  to  mean  the 
derivation  rA  with  free  assumption  variables  among  ur. 

The  proof  is  by  induction  on  the  height  of  the  given  derivation,  which  by 
section  2,2  we  can  assume  to  be  normal.  We  distinguish  two  cases  according 
to  whether  it  ends  with  an  introduction  rule  (i.  e.  ~>+  or  V+)  or  with  an 
elimination  rule. 

Case  la.  ( \xArB)A~~B  with  «*",  i>a.  By  induction  hypothesis  for  rB  with 
we  have  r,F  with  if  and  r®  with  y?.xA,vh.  An  application  of 
— >+  to  the  latter  derivation  yields  ( \xArB )A~B  with  y'b  ?A. 

Case  lb.  (Axr'*)VrA  with  if,  tiA,  where  x  is  not  free  in  T,  A.  By  induction 
hypothesis  for  rA  with  t7r,  we  have  if  with  ur  and  rA  with  y^.  FA.  Since 
x  is  not  free  in  f,  we  know  that  x  is  not  free  in  F.  An  application  of  V+  to 
the  latter  derivation  yields  (Axr£)Vr'4  with  yf , 

Case  2a.  (u>r~Dsc'f)''  with  ur,uA. 

Subcase  i.  U)C~°  is  among  if.  By  induction  hypothesis  for  sc  with 
if ,  va  we  have  .7,  F  with  ?A  and  sf  with  yh' .  if  .  By  induction  hypothesis 
for  (uDi]A  with  uD,  ur,uA  we  have  tP  with  uD,if  and  tA  with  iP,vA.  From 
these  derivations  we  obtain 

(Ay|i"(t^)4iuc'’0sf})^''(5  with  t7r 
and 

(tA )f{z^~^s( j  with  £/'~<\i7A, 
where  F  — »  O  means  F  — »  Gt, . . . ,  F  — >  G„. 
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Subcase  ii.  wc~D  is  among  vA.  By  induction  hypothesis  for  sc  with 
if,vA  we  have  sf  with  if  and  sf  with  tfr,tf>',<a.  By  induction  hypothesis 
for  (uDt)A  with  uD,if,vA  we  have  tf  with  t f  and  tA  with  if,  uD ,  vA.  From 
these  derivations  we  obtain 

(s„MW  with  f 

and 

(tj  )u[tec“0s^!  with 
Case  2b.  vfzCst  with  xf,vA. 

Subcase  i.  u>VrC  is  among  if .  By  induction  hypothesis  for  (ucbl<^''  wj^ 
uc^,  tf ,  uA  we  have  tf  with  uc  bl  ( {f  an(j  with  if,  vA.  Let  zbe  all  variables 
free  in  F  that  are  in  s,  but  not  free  in  [\  We  now  construct  derivations 

(Az(ff  )u[uiVlCs])v?^  with  if 


and 

with  fy?f,u4, 

where  Vzf  means  VzFi, . . .  ,VzFm.  Note  that  any  object  or  predicate  variable 
free  in  VzF  is  both  free  in  A,  A  and  free  in  T. 

Subcase  ii.  uiVlC  is  among  vA.  By  induction  hypothesis  for 
with  ucIdIur)i7A  we  have  ??  with  tf  and  fj  with  if1, ucM,  vA.  Let  z  be  all 
variables  free  in  G  that  are  in  s,  but  not  free  in  A,  A.  We  now  construct 
derivations 

(Avv?‘^(t ;zf,)rv?-'<5  with  if 

and 

(r"'/,-/‘Au''/'.i’yf^3AzAjf(u(^)u|u>VlCs)))/*  with  x-y**S,v* 

and  stability  assumptions  (which  are  used  to  build  f"'4— A).  Note  again  that 
any  object  or  predicate  variable  free  in  -iiz-'G  is  both  free  in  F  and  free  in 
A,  A.  a 
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Chapter  3 

Normalization  for  arithmetic 


3.1  Ordinal  notations 

We  want  to  discuss  the  derivability  and  underivability  of  initial  cases  of 
transfinite  induction  in  arithmetical  systems.  In  order  to  do  that  we  shall 
need  some  knowledge  and  notations  for  ordinals.  Now  we  do  not  want  to 
assume  set  theory  here;  hence  we  introduce  a  certain  initial  segment  of  the 
ordinal  (the  ordinals  <  e0)  in  a  formal,  combinatorial  way,  i.  e.  via  ordinal 
notations.  Our  treatment  is  based  on  the  Cantor  normal  form  for  ordinals 
(cf.  Bachmann  [Bac55]).  We  also  introduce  some  elementary  relations  and 
operations  for  such  ordinal  notations,  which  will  be  used  later. 

Definition  3.1.1  We  define  the  two  notions 

•  a  is  an  ordinal  notation 

•  a  <  0  for  ordinal  notations  o,0 
simultaneously  by  induction: 

1.  If  am, . . . ,  q0  are  ordinal  notations  and  am  >  ...  >  a0  (where  a  >  0 
means  a  >  0  or  a  =  0),  then 

w°~  +  .  • .  +  wa° 

is  an  ordinal  notation.  Note  that  the  empty  sum  denoted  by  0  is  allowed 
here. 
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2.  If  u)°m  +•••■(-  uiao  and  ur3”  +  •  ■  ■  +  ui'3°  are  ordinal  notations,  then 
u ;“m  +  ■  ■  ■  +  u>a°  <  + - f  uA 

iff  there  is  an  i  >  0  such  that  am_;  <  om-.+i  =  fin-i+t - - o.  = 

fin,  or  else  m  <  n  and  am  ~  fi„ . a0  =  0n_m 

It  is  easy  to  see  (by  induction  on  the  levels  in  the  inductive  definition) 
that  <  is  a  linear  order  with  0  being  the  smallest  element. 

We  shall  use  the  notation  1  for  u>°,  a  for  u>°  H - (-  u>°  with  a  copies  of  u>° 

and  uiaa  for  uia  +  ■  ■  ■  -f  a>a  again  with  a  copies  of  w“. 

Definition  3.1.2  of  addition  for  ordinal  notations: 

uiam  +  •  •  •  +  0Jaa  +  uA  +  ■  •  ■  +  uA  :=  uj°m  +  •  •  •  +  wa'  +  uA  +  ■  ■  ■  +  uA 

where  i  is  minimal  such  that  o,-  >  fin. 

It  is  easy  to  see  that  +  is  an  associative  operation  which  is  strictly  monotonic 
in  the  second  argument  and  weakly  monotonic  in  the  first  argument.  Note 
that  +  is  not  commutative:  I+u>  =  u;^u>+l. 

Definition  3.1.3  of  natural  (or  Hessenberg)  addition  of  ordinal  notations: 

(u>nm  +  •  ■  •  +  +  •  •  •  +uA)  :=  +••■+«■» 

where  7m+n, . . . ,  70  is  a  decreasing  permutation  of  am, - a0,  fin,  ■  ■  ■ ,  fio- 

Again  it  is  easy  to  see  that  #  is  associative,  commutative  and  strictly  mo¬ 
notonic  in  both  arguments. 

We  will  also  need  to  know  how  ordinal  notations  of  the  form  fi  +  can 
be  approximated  from  below.  First  note  that 

6  <  a  — *  fi  +  u >sa  <  fi  +  w". 

Furthermore,  for  any  7  <  fi  •+■  u>°  we  can  find  a  6  <  a  and  an  a  such  that 

7  <  fi  +  uisa. 

Definition  3.1.4  of  2°  for  ordinal  notations  a.  Let  am  >  ■  ■  •  00  >  w  > 
kn  >  ■  ■  •  >  k  1  >  0.  Then 

2u/“'"+...+u,*o+w*"+-+w‘> +w°a  +~+u‘> ' 1  2« 
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[t  is  easy  to  see  that  2a+l  =  2“  +  2“  and  that  2a  is  strictly  monotonic  in  a. 

In  order  to  work  with  ordinal  notations  in  a  purely  arithmetical  system 
we  set  up  a  bijection  between  ordinal  notations  and  nonnegative  integers 
(i.  e.,  a  Godel  numbering).  For  its  definition  it  is  useful  to  refer  to  ordinal 
notations  in  the  form 

+ - 1-  uiaca0  with  am  >  ■  ■  ■  >  a0. 

Definition  3.1.5  For  any  ordinal  notation  a  we  define  its  Godel  number  |or| 
inductively  by 

|0|  :=  0, 

+ - h^flol  :=  ( -  1. 

t<m 

For  any  nonnegative  integer  x  we  define  its  corresponding  ordinal  notation 
ox  inductively  by 

oO  =  0 

o((  n  ?»?')  - 1)  = 

»<m  *<m 

where  the  sum  is  to  be  understood  as  the  natural  sum. 

Lemma  3.1.1  1.  o|a|  =  a, 

2.  [oil  =  x.  □ 

This  can  be  proved  easily  by  induction. 

Hence  we  have  a  bijection  between  ordinal  notations  and  nonnegative 
integers.  Using  this  bijection  we  can  transfer  our  relations  and  operations 
on  ordinal  notations  to  computable  relations  and  operations  on  nonnegative 
integers.  We  will  use  the  notations 


i  -<  y 

for 

ox  <  oy 

w1 

for 

Krl 

x®y 

for 

l(oi)  +  (oy )  | . 
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3.2  Provable  initial  cases  of  transfinite  in¬ 
duction 

We  now  set  up  some  formal  systems  of  arithmetic  and  derive  initial  cases  of 
the  principle  of  transfinite  induction  in  them,  i.  e.  of 

Vx(Vy  •<  x  :  Py  — >  Px)  — *  Vx  -<  a  :  Px 

for  some  numeral  a  and  a  predicate  variable  P.  In  section  3.4  we  will  see 
that  our  results  here  are  optimal  in  the  sense  that  for  larger  segments  of 
the  ordinals  transfinite  induction  is  underivable.  All  these  results  are  due  to 
Gent2en  [Gen43]. 

Our  arithmetical  systems  are  based  on  a  fixed  (possibly  countably  infinite) 
supply  of  function  constants  and  predicate  constants  which  are  assumed  to 
denote  fixed  functions  and  predicates  on  the  nonnegative  integers  for  which  a 
computation  procedure  is  known.  Among  the  function  constants  there  must 
be  a  constant  S  for  the  successor  function  and  0  for  (the  0-place  function) 
zero.  Among  the  predicate  constants  there  must  be  a  constant  =  for  equalitv 
and  _L  for  (the  0-place  predicate)  falsity.  In  order  to  formulate  the  general 
principle  of  transfinite  induction  we  also  assume  that  predicate  variables 
P,  Q, .  ■ .  are  present. 

Terms  are  built  up  from  object  variables  x,y,z  by  means  of  fr\  ..  .rm, 
where  /  is  a  function  constant.  We  identify  closed  terms  which  have  the 
same  value;  this  is  a  convenient  way  to  express  in  our  formal  systems  the  as¬ 
sumption  that  for  each  function  constant  a  computation  procedure  is  known. 
Terms  of  the  form  SS ...  SO  are  called  numerals.  We  use  the  notation  S'O  or 
even  i  for  them.  Formulas  are  built  up  from  prime  formulas  Prt  . . .  rm  with 
P  a  predicate  constant  or  a  predicate  variable  by  means  of  (A  — >  B)  and 
VxA.  As  usual  we  abbreviate  A  — ►  X  by  -'A. 

The  axioms  of  our  arithmetical  systems  will  always  include  the  Peano- 
axioms 

Vx y(Sx  =  Sy  -*  x  =  y), 

'ix(Sx  ±  0). 

Any  instance  of  the  induction  scheme 

A[0],Vx(A[x]  -  A[5x])  -*  VxA[x] 

with  A  an  arbitrary  formula  is  an  axiom  of  full  arithmetic  Z.  We  will  also 
consider  subsystems  Z*  of  Z  where  the  formulas  A  in  the  induction  scheme 
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are  restricted  to  IT® -formulas;  the  latter  notion  is  defined  inductively,  as 
follows. 

1.  Any  prime  formula  Pf  is  a  n°-formula,  for  any  k  >  1. 

2.  If  A  is  quantifier-free  and  B  is  a  Il°-formula,  then  A  — »  B  is  a  11°- 
formula. 

3.  If  A  is  a  nj-formula  and  B  is  a  flf-formula,  then  A  — *  B  is  a  lip- 
formula  with  p  =  max(fc  +  1,/). 

4.  If  A  is  a  11° -formula,  then  so  is  VxA. 

Note  that  a  formula  is  a  lI°-formula  iff  it  is  logically  equivalent1  to  a  formula 
with  a  prefix  of  k  alternating  quantifiers  beginning  with  V  and  a  quantifier 
free  kernel.  For  example,  'ix3yizPxyz  is  a  Il°-formula.  In  addition,  in  any 
arithmetical  system  we  have  the  equality  axioms 

Vx(x  =  x), 

Vxy(xi  =  yu...,xm  =  ym  -*■  f£  =  /jf), 

Vxy(*!  =  yi,...,xm  =  ym, Px  Py) 

for  any  function  constant  /  and  predicate  constant  or  predicate  variable  P. 
We  also  require  for  any  such  P  the  stability  axioms 

Vx(->->Px  — >  Px). 

We  express  our  assumption  that  for  any  predicate  constant  a  decision  proce¬ 
dure  is  known  by  adding  the  axiom 

P(S"0) . . .  (S'm0) 

whenever  Pi  is  true,  and 

-P(S”  0)...(5-0) 

whenever  Pi  is  false. 

We  finally  allow  in  any  of  our  arithmetical  systems  an  arbitrary  supply  of 
true  nj-formuals  as  axioms.  Our  (positive  and  negative)  results  concerning 

'This  means  logically  equivalent  in  classical  logic,  i.  e.  both  implications  are  derivable 
by  the  rules  of  (minimal  logic  as  given  in)  chapter  2  together  with  stability  axioms. 
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initial  cases  of  transfinite  recursion  will  not  depend  on  which  of  those  axioms 
we  have  chosen,  except  that  for  the  positive  results  we  always  assume 


Vx(x  /  0)  (3.1) 

Vxy(z  X  y  ®u>°,z  /  y,z  /  y  -*  i.)  (3.2) 

Vx(x  ®  0  =  x)  (3.3) 

Vxyz(x  ®  (y  ©  z)  =  (x  ffi  y)  ©  z)  (3.4) 

Vx(0  ffi  x  =  x)  (3.5) 

VxKO  =  0)  (3.6) 

Vxy(ur(5y)  =  uizy  ©wr)  (3.7) 

Vxyz(z  X  y  ©  u>r,x  /  0,  x  — ►  z  X  y  ©u/IV*(yxyz))  (3.8) 

Vxyz(z  X  y  ©wr,  i  /  0  — ►  fxyz  x  i)  (3.9) 

where  in  3.9  /  and  g  are  function  constants. 


Theorem  3.2.1  (Gentzen)  Transfinite  induction  up  to  u>„  (with  := 
u),w„+i  :=u“")  i.  e.  the  formula 

Vx(Vy  X  x  :  A[y)  -*  A[xj)  -*  Vx  x  u>„  :  A[x) 

is  derivable  in  Z. 

Proof:  To  any  formula  A  we  assign  a  formula  A+  (with  respect  to  a  fixed 
variable  x)  by 

-4+  :=  Vy(Vz  x  y :  Ar[z] -Vrxyffiw1:  Ax[z)). 

We  first  show 

,4  is  progressive  — *  .4+  is  progressive. 

where  *B  is  progressive '  means  Vx(Vy  x  x  :  B[y]  — *  B[x)).  So  assume  that 
.4  is  progressive  and 

Vy  x  x  :  .4+[y].  (3.10) 

We  have  to  show  .4+[x],  So  assume  further 

V:xy:.4[:|  (3.11) 

and  z  X  y  ©  wr.  We  have  to  show  /l[z).  Case  x  =  0.  From  z  Xy®w0  we 
nave  by  3.2  z  -<  yV  z  =  y.  If  z  X  y,  then  .4[z]  follows  from  3.11.  and  if  z  =  y. 
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then  A[z]  follows  from  3.11  and  the  progressiveness  of  .4.  Case  x  /  0.  From 
-  -<  y  ®wr  we  obtain  z  X  y  ®^xyzgxyz  by  3.8  and  fxyz  X  x  by  3.9.  From 
3.10  we  obtain  A+[fxyz].  By  the  definition  of  A+  we  get 

Vu  -<  y  ©  Jzyxv  :  A[u]  -*  Vu  X  (y  ©  w/r!"u)  ©  w'**1  :  Afu] 

and  hence,  using  3.4  and  3.7 

Vu  -<  y  ©  u/IVIu  :  A[u]  — »  Vu  X  (y  ©  u/xl"(Su)  :  A[u]. 

Also  from  3.11  and  3.6,  3.3  we  obtain 

Vtt^jQw^'O:  A(u], 

Using  an  appropriate  instance  of  the  induction  scheme  we  can  conclude 
Vu  X  y  ©  ujtzy*gxyz  \  A[u] 


and  hence  A[z]. 

We  now  show,  by  induction  on  n,  how  to  obtain  a  derivation  of 

Vx(Vy  X  x  :  A[t/]  — »  A{x])  — *  Vx  x  u>n  :  A[x]. 

So  assume  the  left-hand  side,  i.  e.  assume  that  A  is  progressive.  Case  0. 
From  x  X  w0  we  get  x  =  0  by  3.5,  3.2  and  3.1,  and  A[0]  follows  from  the 
progressiveness  of  A  by  3.1.  Case  n  +  1.  Since  A  is  progressive,  by  what  we 
have  shown  above  also  A+  is  progressive.  Applying  the  induction  hypothesis 
to  A+  yields  Vx  X  uin  :  A+[x],  and  hence  A+[w„]  by  the  progressiveness  of 
A+[x].  Now  the  definition  of  A+  (together  with  3.1  and  3.5)  yields  Vz  X 
:  A\z).  a 

Note  that  in  these  derivations  the  induction  scheme  was  used  for  formulas 
of  unbounded  complexity. 

We  now  want  to  refine  Theorem  3.2.1  to  a  corresponding  result  for  the 
subsystems  Zy  of  Z,  Note  first  that  if  A  is  a  n®-formula,  then  the  formula 
A+  constructed  in  the  proof  of  Theorem  3.2.1  is  a  Il®+,  -formula,  and  for  the 
proof  of 

A  is  progressive  — >  A+  is  progressive 

we  have  used  induction  with  a  11°  induction  formula. 

Now  let  A  be  a  II°-formula,  and  let  A°  :=  A,  A,+>  :=  ( A‘)+.  Then  Ak  is  a 
tl“+, -formula,  and  hence  in  Zj,  we  can  derive  that  if  A  is  progressive,  then  also 
A1,  A1, ...  A*  are  all  progressive.  Let  wi[m]  :=  m,w,+i[m]  =  Since  in 
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Zk  we  can  derive  that  Ak  is  progressive,  we  can  also  derive  dfc[0],  j4*[1],  Ak[ 2] 
and  generally  Ak[m]  for  any  m,  i.  e.  d*[u>i(m]].  But  since 

=  (Afc~‘)+  =  Vy(Vz  -<  y  :  Ak~'[z]  -*  Vz  X  y®wr  :  d*-'[z]), 

we  first  get  (with  y  =  0)  Vz  -<  u)2[m]  :  d*_1[z]  and  then  ,4*-,[u;2[m]]  by 
the  progressiveness  of  Ak~l.  Repeating  this  argument  we  finally  obtain 
A°[uj/k+i(m]].  Hence  we  have 

Theorem  3.2.2  Let  A  be  a  n°-formula.  Then  in  Zk  we  can  derive  transfi- 
nite  induction  for  A  up  to  wt+)[m]  for  any  m,  i.  e. 

Zk  b  Vx(Vy  -<  x  :  d[y]  -►  d[x])  -*  Vx  X  wt+i[m]  :  A[x].  □ 

If  more  generally  we  start  out  with  a  n°-formula  A  instead,  where  1  <  l  <  k. 
then  a  similar  argument  yields 

Theorem  3.2.3  Let  A  be  a  Tit -formula,  1  <  /  <  k.  Then  in  Zk  we  can 
derive  transfinite  induction  for  A  up  to  u)*+2-i[m]  for  any  m,  i.  e. 

Zk  b  Vx(Vt/  x  x  :  d[j/]  ->  /4[x])  -+  Vx  <  tut+2_i[m] :  d[x],  □ 

Our  next  aim  is  to  prove  that  these  bounds  are  sharp.  More  precisely,  we 
will  show  that  in  Z  (no  matter  how  many  true  n?-formulas  we  have  added 
as  axioms)  one  cannot  derive  transfinite  induction  up  to  eo,  i.  e.  the  formula 

Vx(Vy  x  x  :  Py  —*  Px )  — *■  VxPx 

with  a  free  predicate  variable  P ,  and  that  in  Zk  one  cannot  derive  transfinite 
induction  up  to  w*+1,  i.  e.  the  formula 

Vx(Vy  X  x  :  Py  — *  Px)  -*ViX  u>k+\  ■  Px. 

This  will  follow  from  the  method  of  normalization  applied  to  arithmetical 
systems,  which  we  have  to  develop  first. 


3.3  Normalization  for  arithmetic  with  the 
ot-rule 

We  will  show  in  section  3.5  that  a  normalization  theorem  does  not  hold  for  a 
system  of  arithmetic  like  Z  in  section  3.2,  in  the  sense  that  for  any  formula 
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A  derivable  in  Z  there  is  a  derivation  of  the  same  formula  A  in  Z  which  only 
uses  formulas  of  a  level  bounded  by  the  level  of  A.  The  reason  for  this  failure 
is  the  presence  of  the  induction  axioms,  which  can  be  of  arbitrary  level. 

Here  we  remove  that  obstacle  against  normalization  and  replace  the  in¬ 
duction  axioms  by  a  rule  with  infinitely  many  premises,  the  so-called  u-rule 
(suggested  by  Hilbert  and  studied  by  Lorenzen,  Novikov  and  Schutte),  which 
allows  to  conclude  VxA[x]  from  A[0],  A{1],  4 [2], .... 

Clearly  this  w-rule  can  also  be  used  to  replace  the  rule  V+.  As  a  conse¬ 
quence  we  do  not  need  to  consider  free  object  variables. 

So  we  introduce  the  system  of  w-arithmetic  as  follows.  Z°°  has  the 
same  language  and  —  apart  from  the  induction  axioms  —  the  same  axioms  as 
Z.  Derivations  in  Z°°  are  infinite  objects;  they  are  built  up  from  assumption 
variables  xA,yB  and  constants  ax'4  for  any  axiom  A  of  Z  other  than  an 
induction  axiom  by  means  of  the  rules 

(\xArB)A~B 

(tA~BsA)B 

denoted  by  — ►+,— *~,w  and  V",  respectively. 

More  precisely,  we  define  the  notion  of  an  x-derivation2  (i.  e.  a  derivation 
in  Z°°  with  free  assumption  variables  among  x)  of  height  <  a  and  degree 
<  k  inductively,  as  follows3 

*  Any  assumption  variable  xA  and  any  axiom  ax'4  is  an  x-derivation  of 
height  <  a  and  degree  <  fc,  for  any  list  x  of  assumption  variables 
(containing  x  in  the  first  case),  ordinal  a  and  number  k. 

-*+  If  rB  is  an  x,  x,y-derivation  of  height  <  a0  <  a  and  degree  <  k,  then 
(A xArB)A~'B  is  an  x,  y-derivation  of  height  <  a  and  degree  <  k. 

2Note  that  derivations  are  infinite  objects  now.  They  may  be  viewed  as  mappings 
from  finite  sequences  of  natural  numbers  (=  nodes  in  the  derivation  tree)  to  lists  of  data 
including  the  formula  appearing  at  that  node,  the  rule  applied  last,  a  list  of  assumption 
variables  including  all  those  free  in  the  subderivation  (starting  at  that  node),  a  bound  on 
the  height  of  the  subderivation,  and  a  bound  on  the  degree  of  the  subderivation. 

intuitively,  the  degree  of  a  derivation  is  the  least  number  >  the  level  of  any  subde¬ 
rivation  Axr  in  a  context  (A*r)s  or  frj)«u  in  a  context  where  the  level  of  a 

derivation  is  the  level  of  its  type,  i.e.  the  formula  it  derives.  This  notion  of  a  degree  is 
needed  for  the  normalization  proof  we  give  below. 
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If  tA~~B  and  sA  are  x-derivations  of  heights  <  a,  <  a  and  degrees 
<  ki  <  k  (i  =  1,2),  then  (tA~ BsA)B  is  an  x-derivation  of  height  <  a 
and  degree  <  m  with  m  =  max(fc,  level  (A  — *  B)),  if  tA~B  is  generated 
by  the  rule  — *+,  or  of  degree  <  k  otherwise. 

ui  If  are  x-derivations  of  heights  <  a,  <  a  and  degrees  <  k,  < 
k  (i  <  cj),  then  (rt^)^fA  is  an  x-derivation  of  height  <  a  and  degree 
<k. 

V-  If  t*zA  is  an  x-derivation  of  height  <  ao  <  a  and  degree  <  k,  then 
(fVx'4i)-4I*l  is  an  x-derivation  of  height  <  a  and  degree  <  m  with  m  = 
max(fc,  level  Vx/1),  if  lilA  is  generated  by  the  rule  w,  or  of  degree  <  k 
otherwise. 

We  now  embed  our  systems  Zk  (i.  e.  arithmetic  with  induction  restricted 
to  nj|! -formulas)  and  hence  Z  into  Z°°. 

Lemma  3.3.1  Let  rB  be  a  derivation  in  Zt  with  free  assumption  variables 
among  xA  which  contains  <  m  instances  of  the  induction  scheme  all  with 
induction  formulas  of  level  <  k.  Let  o  be  a  substitution  of  numerals  for 
object  variables  such  that  Acr,  B<r  do  not  contain  free  object  variables.  Then 
we  can  find  an  xA" -derivation  (r00)B'7  in  Z°°  of  height  <  u>m  +  h  for  some 
h  <  ui  and  degree  <  k. 

Proof:  First  note  that  from  any  normal  derivation  in  first-order  logic  we  can 
construct  a  normal  derivation  rB  with  the  same  free  assumption  variable  xA, 
such  that  in  r§  any  path  has  a  prime  formula  as  its  minimal  formula  (cf  p. 
23).  For  if  A  is  a  minimal  formula  which  is  not  prime  we  can  first  apply 
elimination  rules  until  a  prime  formula  is  reached  and  later  build  a  up  again 
by  the  corresponding  introduction  rules. 

The  lemma  is  proved  by  induction  on  the  height  of  the  given  derivation 
r.  By  the  normalization  theorem  2.2.1  and  the  note  above  we  can  assume 
that  r  is  normal  with  prime  minimal  formulas.  The  only  case  which  requires 
some  argument  is  when  r  consists  of  two  applications  of  — ♦”  to  an  instance 
of  the  induction  scheme.  Then  r  must  have  the  form 

ax^(Ol,Vr(a(I]-^(Sil)-Vr/t(r|s^[Ol(AlAsfX(rl^[S(x]j 

By  induction  hypothesis  we  obtain  derivations 

of  height  <  uim~x  +  ha 
of  height  <  a.”*-1  -2  +  fc„ 

^oo*ll^oo,|[soo°1])  of  height  <  •  3  +  h} 
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and  so  on,  all  of  degree  <  k.  Combining  all  these  derivations  of  A[i]  as 
premisses  of  the  w-rule  yields  a  derivation  tx  of  VxAfxj  of  height  <  u>m  and 
degree  <  Jfc.  Q 

A  derivation  is  called  convertible  if  it  is  of  the  form  (Axr)s  or  else  (r,),<MJj, 
which  can  be  converted  into  rx[s]  or  rJ:  respectively.  Here  rx[s]  is  obtained 
from  r  by  substituting  s  for  all  free  occurences  of  x  in  r.  A  derivation  is 
called  normal  if  it  does  not  contain  a  convertible  subderivation.  Note  that  a 
derivation  of  degree  <  0  must  be  normal. 

We  want  to  define  an  operation  which  by  repeated  conversions  transforms 
a  given  derivation  into  a  normal  one  with  the  same  end  formula  and  no 
more  assumption  variables.  The  methods  employed  in  sections  1  and  2  to 
achieve  such  a  task  have  to  be  adapted  properly  in  order  to  deal  with  the 
new  situation  of  infinitary  derivations.  Here  we  give  a  particularly  simple 
argument  due  to  W.W.  Tait  [Tai65]. 

Lemma  3.3.2  If  r  is  an  x,  xA,  y-derivat  n  of  height  <  a  and  degree  <  k 
and  sA  is  an  x,y -derivation  of  height  <  ft  and  degree  <  l,  then  rx[s]  is  an 
x,y -derivation  of  height  <  ft  +  a  and  degree  <  ma x(k,t,  level  s). 

This  is  proved  by  a  straightforward  induction  on  the  height  of  r.  □ 

Lemma  3.3.3  For  any  x-derivation  rA  of  height  <  a  and  degree  <  k  +  1 
we  can  find  an  x-derivation  (rk)A  of  height  <  2“  and  degree  <  k. 

The  proof  is  by  induction  on  a.  The  only  case  which  requires  some  argument 
is  when  r  is  of  the  form  ts  with  t  of  height  <  art  <  a  and  s  of  height  <  qj  <  a. 
We  first  consider  the  subcase  where  tk  =  Axfj  and  level  t  =  k  +  1.  Then 
level  s  <  k  by  the  definition  of  level,  and  hence  (<i)r(s*]  has  degree  <  k 
by  Lemma  3.3.2.  Furthermore,  also  by  Lemma  3.3.2,  (ti)x[s*]  has  height 
<  2"J  +  2"'  <  <  2°.  Hence  we  can  take  (ts)k  to  be  (ti)i[s*).  If 

we  are  not  in  the  above  subcase,  we  can  simply  take  (ts)k  to  be  tksk.  This 
derivation  clearly  has  height  <  2°.  Also  it  has  degree  <  k,  which  can  be  seen 
as  follows.  If  level  t  <  k  we  are  done.  If  however  level  t  >  k  +  2,  then  t  must 
be  of  the  form  tat i . . .  tm  for  some  assumption  variable  or  axiom  t0  (since  r 
has  degree  <  Jfc  +  1).  But  then  tk  has  the  form  Mi  •  ■  •  and  we  are  done 
again.  (To  be  completely  precise,  this  last  statement  has  to  be  added  to  the 
formulation  of  the  Lemma  above  and  proved  simultaneously  with  it).  □ 

As  an  immediate  consequence  we  obtain 

Theorem  3.3.1  For  any  x-derivation  rA  of  height  <  a  and  degree  <  k 
we  can  find  a  normal  x-derivation  (r’)'4  of  height  <  2 *a  ( where  2oo  = 
a,2m+ia  =  2J").  □ 


3.4  Unprovable  initial  cases  of  transfinite  in¬ 
duction 

We  now  apply  the  technique  of  normalization  for  arithmetic  with  the  u)-rule 
for  a  proof  that  transfinite  induction  up  to  e0  is  underivable  in  Z,  i.  e.  of 

Z  1/  Vx(Vy  X  x  :  Py  Px)  — ►  VxPx 

with  a  predicate  variable  P,  and  that  transfinite  induction  up  to  u>k+\  is 
underivable  in  Zt,  i.  e.  of 

Zk  I /Vx(Vx  X  x  :  Py  — >  Px)  — >  Vx  -<  :  Px. 

Our  proof  is  based  on  an  idea  of  Schiitte,  which  consists  in  adding  a  so-called 
progression  rule  to  the  infinitary  systems.  This  rule  allows  to  conclude  Pj 
(where  j  is  any  numeral)  from  all  Pi  f or  i  x  j. 

More  precisely,  we  define  the  notion  of  an  x-derivation  in  Z°°  +  Prog(P) 
of  height  <  a  and  degree  <  k  by  the  inductive  clauses  of  section  3.2  and  the 
additional  clause 

Prog (P)  If  rp  are  x-derivations  of  heights  <  a,  <  a  and  degrees  <  k,  <  k 
(i  -<  j),  then  is  an  x-derivation  of  height  <  a  and  degree  <  k. 

Since  this  progression  rule  only  deals  with  derivations  of  prime  formulas  it 
does  not  affect  the  degrees  of  derivations.  Hence  the  proof  of  normalization 
for  Z°°  carries  over  unchanged  to  Z°°  +  Prog(P).  In  particular  we  have 

Lemma  3.4.1  For  any  x-derivation  rA  in  Z°°+  Prog(P)  of  height  <  a  and 
degree  <  k  +  1  we  can  find  an  x-derivation  (rk)A  in  Z°°  +  Prog(P)  of  height 
<  2“  and  degree  <  k.  □ 

We  now  show  that  from  the  progression  rule  for  P  we  can  easily  derive  the 
progressiveness  of  P. 

Lemma  3.4.2  We  have  a  normal  derivation  of  Vx(Vy  x  x  :  Py  — *  Px)  in 
Z°°  +  Prog(P)  with  height  <  5. 

Proof:  By  the  w-rule  it  suffices  to  derive  Vy  X  j  :  Py  —»  Pj  for  any  j  with 
height  <  4.  We  argue  informally.  Assume  Vy  X  j  :  Py.  By  V"  we  have 
i  X  j  — ►  Pi  for  any  i.  Now  for  any  i  X  j  we  have  i  X  j  as  an  axiom;  hence 
Pi  for  any  such  i.  An  application  of  the  progression  rule  yields  Pj,  with  a 
derivation  of  height  <  3.  Now  by  — *+  and  u>  the  claim  follows.  □ 
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The  crucial  observation  now  is  that  a  normal  derivation  of  P\0\  must  es¬ 
sentially  have  a  height  of  at  least  0.  However,  to  obtain  the  right  estimates  for 
our  subsystems  Z*  we  cannot  apply  Lemma  3.4.1  down  to  degree  0  (i.  e.  to  the 
normal  form)  but  must  stop  already  at  degree  1.  Such  derivations,  i.  e.  those 
of  degree  <  1 ,  will  be  called  almost  normal ;  they  can  also  be  analyzed  easily. 
An  almost  normal  derivation  r  in  Z°°  +  Prog (P)  is  called  a  P|5|,  ->P|/?|- 
refutation  if  r  derives  a  formula  A  — *  B  with  A  and  the  free  assumptions  in  r 
among  P|<3|  :=  P|ai|, . . . ,  P|am|  and  — >P|^|  :=  ->P|/?i|, . . . ,  ^P\0n\  and  true 
prime  formulas,  and  B  a  false  prime  formula  or  else  among  ->P \f}\. 

Lemma  3.4.3  Let  r  be  an  almost  normal  P\3\,-'P\0\-refutation  of  height 
<  | r |  with  3  and  0  disjoint.  Then 

min  0  <  |r |  +  #a, 
where  #3  denotes  the  number  of  ordinals  in  3. 

Proof:  By  induction  on  a.  Note  that  we  may  assume  that  r  does  not  con¬ 
tain  either  ui  or  else  V~.  Note  also  that  r  cannot  be  an  equality  axiom 
ax/’M.M=l<l-P|<l  with  7  =  8  true,  since  we  have  assumed  that  5  and  0  are 
disjoint.  We  distinguish  cases  according  to  the  last  rule  in  r. 

Case  — *+:  By  our  definition  of  refutations  the  claim  follows  immediately 
from  the  induction  hypothesis. 

Case  — Then  r  =  If  A  js  a  true  prime  formula,  the 

claim  follows  from  the  induction  hypothesis  for  t.  If  A  is  a  false  prime 
formula,  the  claim  follows  from  the  induction  hypothesis  for  s.  If  A  is  -’->P|7| 
(and  hence  t  =  axVr*',',Pl_*Pr'|7|),  then  since  the  level  of  -|-,P|7|  is  2  the 
derivation  must  end  with  an  introduction  rule,  i.  e.  s  =  Ax-’^Sp 

(for  otherwise,  since  no  axiom  contains  some  ->->Pro  as  a  strictly  positive 
subformula,  we  would  get  a  contradiction  against  the  assumption  that  r  has 
degree  <  1).  The  claim  now  follows  from  the  induction  hypothesis  for  s  . 
The  only  remaining  case  is  when  A  is  Pill-  Then  t  is  an  almost  normal 
P|7|,  P|a|, -iP|/9|  -refutation  and  s  is  an  almost  normal  P|o|,“«P|/?|, -<P|7| 
-refutation.  We  may  assume  that  7  is  not  among  5,  since  otherwise  the  claim 
follows  immediately  from  the  induction  hypothesis  for  t.  Hence  we  have  by 
the  induction  hypothesis  for  t 

m\n0  <  |(|  +  #<J  +  1  <  |r|  +  #5. 
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Case  Prog (P).  Then  r  =  By  induction  hypothesis,  since  rs  is 

a  P|a|,  ^P\fi\,-'P\6\  -refutation,  we  have  for  all  6  <  7 

min (0,6)  <  |rf|  +  #a  <  |r|  +  #a 


ans  hence 

min(/?,7)  <  |r|  +  #a.  a 

Now  we  can  show 

Theorem  3.4.1  Transfinite  induction  up  to  e 0  is  underivable  in  Z,  i.  e. 

Z  I /Vx(Vy  <x  :  Py  —>  Px)  —*  VxPx 

with  a  predicate  variable  P,  and  that  transfinite  induction  up  to  u>k+\  is  un¬ 
derivable  in  Zk,  i.  e. 

Zk  t/Vx(Vi  -<  x  :  Py  — >  Px)  — <•  Vx  -<  1  :  Px. 

Proof:  We  restrict  ourselves  to  the  second  part.  So  assume  that  transfinite 
induction  up  to  u;*+1  is  derivable  in  Zk.  Then  by  the  embedding  of  Z*  into 
Z°°  (Lemma  3.3.1)  and  the  normal  derivability  of  the  progressiveness  of  P 
in  Z°°  +  Prog(P)  with  finite  height  (Lemma  3.4.2)  we  can  conclude  that 
Vi  -<  Wfc+1  :  Px  is  derivable  in  Z°°  +  Prog(P)  with  height  <  ujm  +  h  for 
some  m,h  <  u>  and  degree  <  k.  Now  k  —  1  applications  of  Lemma  3.4.1 
yield  a  derivation  of  the  same  formula  Vi  -<  u;t+I  :  Px  in  Z°°  +  Prog(P)  with 
height  <  7  <  2*_i (u>m  4 ■  h)  <  u>k+\  and  degree  <  1,  hence  also  a  derivation 
of  P I7  +  1|  in  Z°°  +  Prog (P)  with  height  <  7  and  degree  <  1.  But  this 
contradicts  Lemma  3. 4. 3.0 


3.5  Normalization  for  finitary  arithmetic  is 
impossible 

The  normalization  theorem  for  first-order  logic  applied  to  arithmetic  Z  is  not 
particularly  useful  since  we  may  have  used  in  our  derivation  induction  axioms 
of  arbitrary  complexity.  Hence  it  is  tempting  to  first  eliminate  the  induction 
scheme  in  favour  of  an  induction  rule  allowing  to  conclude  Vi/l[i]  from  a 
derivation  of  .4(0]  and  a  derivation  of  4[Sx]  with  an  additional  assumption 
A[x]  to  be  cancelled  at  this  point  (note  that  this  rule  is  equivalent  to  the 
induction  scheme),  and  then  to  try  to  normalize  the  resulting  derivation  in 
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the  new  system  Z  with  the  induction  rule.  We  will  apply  our  results  from 
section  3.4  to  show  that  even  a  very  weak  form  of  the  normalization  theorem 
cannot  hold  in  Z  with  the  induction  rule. 

Theorem  3.5.1  The  following  weak  form  of  a  normalization  theorem  for  Z 
with  the  induction  rule  is  false:  For  any  xA  -derivation  rB  with  A,  B  n°- 
formulas  there  is  an  xA -derivation  (r*)B  containing  only  U°k-formulas,  with 
k  depending  only  on  l. 

Proof:  Assume  that  such  a  normalization  theorem  would  hold.  Consider  the 
rig-formula 

Vx(Vi/  -<  x  :  Py  — *  Px)  — >  Vx  -<  usn+l  :  Px 

expressing  transfinite  induction  up  toun+i.  By  Theorem  3.2.1  it  is  derivable 
in  Z.  Hence  there  exists  a  derivation  of  the  same  formula  containing  only  H®- 
formulas,  for  some  k  independent  of  n.  Hence  Zk  derives  transfinite  induction 
up  to  un+i  for  any  n.  But  this  clearly  contradicts  Theorem  3.4.1.  Dae 
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LCF  with  realizing  terms:  a  framework  for 
the  development  and  verification  of  programs 

Helmut  Schwichtenberg* 
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July  25,  1988 

A  continuous  functional  in  the  sense  of  Scott  and  Ersov  (cf.  [14],  [5],  [12], 
[16])is  computable  iff  it  is  given  by  a  recursive  enumeration  of  its  finite  ap¬ 
proximations.  In  section  1  we  describe  a  functional  programming  language  L 
which  may  be  used  to  define  computable  functionals  and  give  an  operational 
semantics  for  it  Parallel  computation  is  necessary  because  non-strict  function¬ 
als  are  allowed.  The  tales  of  the  operational  semantics  for  L  are  then  extended 
into  a  formal  system  T  which  is  also  an  extension  of  Scott’s  LCF  introduced 
in  [13].  T  can  be  used  to  prove  that  programs  in  L  meet  their  specifications. 
The  specification  language  contains  (as  in  [15])  a  constructive  existential  quan¬ 
tifier  3*  in  addition  to  the  usual  classical  quantifier  3  (defined  by  — «V— <).  This 
provides  a  useful  means  to  express  in  certain  situations  exactly  the  intended 
computational  content  of  a  formula'.  E.g.  the  subtype  construct  in  [4,  p.32], 
whose  purpose  is  to  to  provide  a  mechanism  for  hiding  information,  becomes 
superfluous  here.  We  also  give  two  examples  of  programs:  In  section  3  for  the 
Warshall  algorithm,  which  decides  reachability  in  finite  graphs,  and  in  section 
4  for  an  algorithm  derived  from  a  proof  of  a  version  of  the  intermediate  value 

’Thanks  are  due  to  Camegie-Mellon-Univerjity  (Pittsburgh,  Pennsylvania,  USA)  for  its  hos¬ 
pitality  during  my  visit  there  in  the  academic  year  1987/88,  and  to  the  Stiftung  Volks  wagenwerk 
for  a  grant  which  made  this  visit  possible.  I  further  want  to  thank  Steve  Brookes,  Daniel  Leivant, 
Frank  Pfenning,  Dana  Scott  and  Rick  Statman  for  helpful  discussions. 

'The  distinction  made  in  [10]  between  the  propositions  that  have  a  ’ computational  informa¬ 
tive “  contents  and  the  propositions  that  only  have  a  * logical "  contents  seems  to  serve  a  similar 
purpose. 
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theorem  in  constructive  real  analysis,  which  may  be  specialized  to  compute  the 
squareroot  of  2.  It  is  discussed  how  one  can  derive  in  T  that  these  programs 
meet  their  specifications. 

1  Operational  semantics  for  a  functional  program¬ 
ming  language 

We  use  simple  types  built  up  from  nat  and  bool  by  p  — ►  a.  Expressions  E  are 
built  up  by  application  from 

1.  typed  variables  X :  p,Y :  p,... 

2.  constants  k*  for  all  finite  approximations  A  (see  [16]) 

3.  some  other  constants,  e.g.2 

•  +1,-1 :  nat  -*  nat 

•  Zero :  nat  -*  bool 

•  =,  <,  <:  nat  — *  (nat  — »  bod) 

Expressions  of  type  bod  are  denoted  by  B.  Programs  are  built  up  from  assign¬ 
ments  Y  :=  E  (where  the  variable  Y  may  occur  in  E)  and  skip  by 

•  /V,. 

•  if  B  then  P  else  Q  fi 

•  for  X  =  1  to  Z  do  P  od,  where  Z  £  var{P) 

•  while  X  =  0  do  P  od 

Here  var(P)  should  consist  of  all  variables  which  P  may  either  read  to  get  an 
input  or  else  write  into  to  produce  output  More  precisely,  we  define 

•  var(X  :-E)m  var^E)  U  {X} 

2No*b  that  constants  like  trne,  fabctbool  and  not:  boot— •  boot  need  not  be  mentioned  since 
they  are  given  by  finite  approximations. 
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•  var(skip)  =  0 

•  vartfY,  • .  • PJ  =  vaH.Pi)  U  •  -  •  U  vaiiP, ) 

•  var(» if  B  then  P  else  Q  od)  =  var(B)  U  var(P)  U  vor(Q) 


•  var(for  X  =  1  to  Z  do  P  od)  =  (vaiiP)  U  {Z})  \  {X} 

•  vor( while  X  =  0  do  P  od)  =  var(P)  U  {X} 

We  now  give  the  rules  of  our  operational  semantics,  in  the  style  of  Hoare- 
logics.  However,  we  only  have  to  deal  here  with  Hoare-triples  having  equations 
as  pie-  and  postconditions3. 


Thinning  rules  Let  X  =  var(P). 


y3*  {X  =  x}P{X  -  x1}  x’3? 

{X  *y}P{X  =  y} 


{X  »  J-}P{X  =  X) 


{X  =  x]P{X  =  x'} 

{X*x,Y*y}P{X  =  x',Y  =  y} 

where  in  the  last  rule  it  is  assumed  that  the  sequence  F  of  variables  contains 
no  variable  from  var(P)*. 

E\  Let  X  be  minimal  such  that  var(E)  C  {X,  F}. 

{X  =  x,F  =  y}F  :==  £{X  =  x,  F  =  E(X,  Y/x,  y)} 


skip)  This  rule  allows  just  to  write  down  skip  considered  as  a  Hoare -triple  with 
empty  pre-  and  postconditions. 


m. 


3Such  Hoare-triples  have  also  received  special  attention  in  recent  work  of  S.  Brookes;  see 


4 Here  and  later  we  use  X,  Y, ...  to  denote  single  variables  as  well  as  sequences  of  variables. 
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P\Q 


Let  X  =  var{P)  U  vatiQ). 


{X  =  x}P{X  =  y}  {X=x'}Q{X=x"} 
{X=x}P;Q{X  =  x"} 


if  B  then  P  else  Q  fij  Denote  this  program  by  R,  and  let  X  =  var(R). 

BQC/x)  =  true  {X  =  x}P{X  =  x'} 

{X  «  x}/?{X  =  x*} 


and  similarly  for  false. 

{X  =  x}P{X  =x'}  {X  =  x)Q{X  =  x'} 

{X  =  x}/?{X=x-} 

|  for  X  «  1  to  Z  do  P  od  |  Denote  this  program  by  Q,  and  note  that  Z  £  var(P). 
Let  y  be  minimal  such  that  var(P)  C  {X,  y}. 


{Z  **  0,Y  -  y}Q{Z  -  0,Y  -  y) 


{Z  =  z,y=y}Q{Z  =  z,y  =  /}  {X  =  z-t-l,y  =  y,}P{X=x',y=/}  Z^-L 
{Z  =  z+l,y  =  y}Q{Z  =  z  +  l,y  =  y"} 

| while X  =  0  do  P  od|  Denote  this  program  by  Q.  Let  Y  be  minimal  such  that 
var(P)  C  {X,  y}.  We  use  an  auxiliary  relation  constant  P*. 

P*xyxy 

p-xyoy  {x»o,y=y'}P{x  =  x,,y  =  y"} 

P'xyxfy" 

P’xyxfy'  X'/O.i. 

{x»x,y=y}Q{x  =  x',y  =  y} 


f 

\ 

l 

t 

E 


4 


To  simplify  our  notation,  we  often  leave  out  preconditions  of  the  form  X  =  X. 
We  also  allow  ourselves  to  leave  out  any  of  the  postconditions.  So  asserting 
the  derivability  of  {X  =  x}P{Y  =  y}  means  that  for  some  jc',  we  can  derive 
{ X=x,Y=±}P{X  =  x!,Y  =  y } 

Lemma  1  (Uniqueness)  If  {X  =  x}P{Y  =  y}  and  {X  =  x}P{Y  =  /}  are  deriv¬ 
able  with  Y :  nat  and  y,y  f  JL,  then  y  =/. 

For  the  proof  one  has  to  generalize  the  claim  sightly  to  cover  the  case  that 
Y  is  of  a  higher  type. 

By  the  Uniqueness  Lemma,  any  program  P  defines  with  respect  to  X  C 
variP)  taken  as  input  variables  and  Y  €  var(P)  taken  as  output  variable  a  func¬ 
tional.  It  can  be  shown  that  the  functionals  definable  in  L  in  this  way  are  exactly 
the  computable  ones.  For  the  proof  one  may  use  either  Feferman’s  (see  [6])  or 
else  Plotkin’s  (see  [12]  )  characterization  of  the  computable  functionals  by  means 
of  schemata. 

The  fragment  of  L  obtained  by  leaving  out  while-loops  defines  a  proper 
subclass  of  the  computable  functionals,  which  may  be  called  partial  primitive 
recursive  functionals.  These  are  closely  related  to  the  Gtidel  primitive  recursive 
functionals  of  [7]:  A  functional  is  partial  primitive  recursive  iff  it  is  the  restric¬ 
tion  of  a  GOdel  primitive  recursive  functional  to  a  Gtidel  primitive  recursive 
domain. 


2  Formal  verification  of  programs 

In  our  specification  language  we  distinguish  between  the  classical  existential 
quantifier  3,  defined  by  and  a  constructive  existential  quantifier  denoted 
by  3*  ;  V*  can  then  be  defined  by 

A  V*  B  :=  3*((x  =  true  — ♦  A)  A  (x  =  false  — *  B)). 

Formulae  without  3*  are  called  negative  and  their  proofs  do  not  have  any  com¬ 
putational  content.  More  generally,  this  also  holds  for  formulae  which  contain 
3*  only  on  left-hand-sides  of  implications,  the  so-called  Harrop-formulae.  Note 
that  -A  is  considered  as  defined  by  A  -*  false. 

A  formula  containing  3*  can  be  considered  as  posing  a  problem  or  a  task, 
which  may  be  fulfilled  by  providing  certain  functionals  realizing  it  in  the  sense 
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of  [8].  E.g.,  Vx3*y.x  <  y  requires  a  function,  Vjc3*y3'z.x  <  y  <  z  requires 
two  functions,  and  (Yr3’yjt  <  y)  — *  3*z.O  <  z  requires  a  functional  of  type 
(nat  — ♦  nat)  — »  nat. 

Generally,  for  any  formula  A  we  define  a  finite  sequence  pt, . . . ,  p„  :=  p  of 
types  such  that  functionals  ...,£*  of  these  types  are  needed  to  realize  A.  The 
definition  is  as  follows.  To  a  prime  formula  there  belongs  the  empty  sequence 
of  types.  If  0  are  the  types  of  A  and  a  =  aj, . . . ,  am  are  the  types  of  B,  then 

•  p,  3  are  the  types  of  A  A  B, 

•  p  — ►  <7i, . . . ,  p  — ►  <7*  are  the  types  of  A  — »  B, 

•  p  — »  oi, . . . ,  p  — *  <rm  are  the  types  of  V£  :  p.B  and 

•  p,  of  are  the  types  of  3*£  :  p.B. 

So,  in  particular,  to  any  Harrop-formula  A  there  belongs  the  empty  sequence  of 
types. 

To  say  that  a  program  P  meets  its  specification  A  then  means  that  P  can 
be  considered  as  defining  (see  p.5)  a  sequence  . . . , of  functionals  of  the 
types  pi,...,p„  associated  toA  as  above,  and  that  (i, . . . ,  realize  A  (written 
£i,  A)  in  the  sense  of  the  following  definition. 

f.CeAAB  :«-»  (6A  A  (gB 

<,,...,C*€A-a  VffceA  -  Ct 

V(((,(,...,(W(6A) 

C  €  3*(.A  €A 

So  any  judgement?  £i...,&,  6  A  with  A  a  formula  possibly  containing  the 
constructive  existential  quantifier  3*  can  be  replaced  by  an  ordinary  negative 
formula  and  hence,  in  principle,  it  is  not  necessary  to  consider  judgements. 
However,  in  the  context  of  program  verification  it  is  very  useful  to  have  the 
suggestive  notation  of  judgements  available. 

Note  that  the  distinction  between  3  and  3*  can  also  be  used  to  obtain  the 
effect  of  the  subtype  construct  in  [4,  p.32].  As  it  is  stated  there,  the  main  purpose 


sTWa  terminology  b  due  to  Weyl  and  was  taken  up  by  Martin-U5f;  see  (9] 


of  the  subtype  construct  is  to  provide  a  mechanism  for  hiding  information  to 
simplify  computation.  Now  this  can  obviously  be  achieved  by  considering  all 
(  :  nat  — ►  nat  satisfying  3x  :  nat£(x)  =  0  instead  of  3*x :  nat.£(x)  =  0. 

Formal  verification  of  programs  requires  the  setup  of  a  formal  system  T 
which  allows  the  derivation  of  certain  valid  judgements.We  do  not  go  into  the 
details  of  T  here  but  only  note  some  of  the  essential  points.  T  has  variables 
x,y,z,...  ranging  over  finite  approximations  as  well  as  variables  C>  7,  •  •  •  rang¬ 
ing  over  ideals  of  finite  approxknations(i.e.continuous  functionals),  both  for  all 
simple  types.  So  T  is  essentially  a  second  order  system.  However,  the  existence 
axioms  for  the  second  order  variables  £,  (,  r>, . . .  are  weak  and  require  only  the 
existence  of  the  computable  functionals.  The  basic  ingredients  of  T  are  the  rules 
given  above  for  our  operational  semantics.  They  are  considered  as  a  means  to 
derive  prime  formulae.  We  also  allow  formal  induction  over  nat  as  a  rule  in  T, 
as  well  as  a  version  of  Scott’s  induction  on  C  introduced  in  [13];  hence  T  can 
be  considered  as  an  extension  of  Scott’s  LCF. 

Using  standard  techniques  from  proof  theory6  it  can  be  shown  that  T  is  a 
conservative  extension  of  classical  arithmetic  Z. 


3  The  Warshall  algorithm 

We  have  choosen  the  Warshall  algorithm  as  an  example  because  it  has  been  dealt 
with  in  the  literature  in  a  number  of  places  (see  Broy  and  Pepper  [3],  Pfenning 
[11]),  and  so  our  treatment  can  be  compared  with  theirs. 

Given  a  finite  directed  graph  and  two  nodes  x,y  in  it,  the  Warshall  algorithm 
decides  whether  there  is  a  path  p  in  the  graph  connecting  x  and  y.  Furthermore, 
it  provides  such  a  path  if  there  is  any. 

We  first  write  out  a  formal  specification  of  the  algorithm.  In  order  to  do  that 
conveniently,  we  extend  our  type  system  by  allowing  cartesian  products  p  x  a 
and  list  types  listp.  Expressions  E  may  now  also  contain  the  constants 7 

•  pair  :  p  -*  (<t  -*  p  x  <r) 

•  fst :  p  x,  <7  — ►  p 

‘Either  by  defining  a  model  for  T  within  Z,  or  else  directly  by  normalization  as  in  (15J. 
Here  and  later  we  use  a  notation  derived  from  Common  Lisp;  see  [17] 
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•  Slid  :  p  x  a  — >  a 

•  nil :  listp 

•  cons  :  p  x  listp  — <•  listp 

•  car  :  listp  — »  p 

•  cdr  :  listp  -»  listp 

We  also  assume  that  we  have  the  following  constants  available. 

assoc  :  p  x  list(p  x  a)  — »  <y 

append  :  listp  x  listp  — >  listp 
length  :  listp  -»  nat 
elt :  listp  x  nat  — ►  p 

and  that  our  formal  system  derives  the  obvious  (conditional;  equations  for  these, 
e.g.‘ 

assoc  (r,  cons(pair(jr,y),  l))=y 
x  y  y  -*  assoc(x,  cons(pair(y,  y, ),  [))  =  assoclt,  f) 
assocfr,  nil)  =  nil 

append(nil,  q)~q 

append(cons(x,  p),  q)  =  cons(jr,  append(p,  <?)) 

In  our  specification,  we  assume  that  the  nodes  x,y  of  the  graph  are  given 
by  natural  numbers  <  n,  and  that  the  result  of  the  algorithm  consists  of  an 
association  list  a  :  Iist((nat  x  nat)  x  list  nat)  with  the  following  property.  If 
there  is  a  path  p  in  the  graph  leading  from  x  to  y,  then  a  assigns  something  to 
pair(x,y),  and  if  a  assigns  something  to  pair(x,y),  then  this  is  a  path  p  leading 
from  x  to  y. 

'Note  that  we  may  easily  write  out  programs  (without  while- loops)  which  define  assoc, 
append  etc.  and  then  treat  formulas  containing  such  defined  functions  as  abbreviations  for  other 
formulas  which  contain  instead  Hoare-triptes  for  these  program].  In  this  setup  the  equations 
above  become  provable. 


8 


It  is  convenient  to  assume  that  the  edges  of  the  graph  are  given  in  the  form 
of  an  association  list  a  :  list((nat  x  nat)  x  list  nat),  which  assigns  to  any  x,  y 
connected  in  the  graph  the  list  C x,y)  of  length  2. 

Our  specification  now  reads  as  follows: 

Ve,  n3’aVx,y,p( init(e,  n)  A  x,y  <  n  — * 

(1)  (path (e,x,y,p)  -»  assoc(pair(j:,  y),  a)  J  nil)A 

(assoc(pair(x,  y)),  a)  j-  nil  — »  path(e,x,y,  assoc(pair(x,y),a)))) 


where 


init(e,  n)  :=  Vi  <  length(e)[length(snd(elt(e,  i)))  =  2  A 
fst(fst(elt(e,  /)))  =  car(snd(elt(e,  /)))  <  n  A 
snd(fst(elt(e,  i)))  =  car(cdr(snd(elt(e,  i))))  <  n] 
path(e,x,  y,p)  :=  elt(p,0)=xA 

elt(p,  length(p)  —  1)  =  y  A 

Vi  <  length(/?)  —  l(assoc(pair(e!t(p,  i),  elt(p,  i  +  1)),  e)  i-  nil) 


An  obvious  way  to  give  a  constructive  proof  of  (1)  and  hence  an  algorithm 
yielding  a  from  e,  n  would  be  to  introduce  an  additional  parameter  m  and  use 
induction  on  m  <  n  to  prove 

Ve,  n,m3*aVjc,y,p(init(e,  ri)  A  x,y  <  n  A  m  <  n 
(2)  (path(e,  m,  x,  y ,  p )  -»  assoc(pair(*,  y),  a)  /  nil)A 

(assoc(pair(x,  y)),  a)  ^  nil  -»  path(e,nj,x,y,  assoc(pair(.x,>),a)))) 


where  init(e,  n )  is  as  above  and 

path (e,m,x,y,p)  :=  elt(p,0)  =  jc  A 

elt(p,  length(p)  -  1)  =  y  A 

Vi  <  length(p)  -  l(assoc(pair(elt(p,  i),  elt(p,  i  +  1)),  e )  J  nil) 
Vi  <  length(p)  -  2(elt(p,i  +  1)  <  m) 
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In  the  induction  step  a^i  is  obtained  from  am  by  appending  a  certain  increment 
i  to  it.  This  increment  is  again  an  association  list.  For  any  pair  x,y  <  n  it  has 
an  entry  iff  am  does  not  already  assign  a  connecting  path  to  x,  y,  but  an  does 
assign  a  path  xpm  to  x,  m  and  also  a  path  mqy  to  m,  y.  Then  i  assigns  the  path 
xpqm  to  x,  y. 

Note  that  the  use  of  association  lists  to  store  the  previous  results  is  essential 
for  the  effectiveness  of  the  algorithm9.  If  instead  we  would  have  used  recursive 
calls  of  the  algorithm  itself,  then  the  fact  that  for  any  x,y,m  we  need  two  such 
recursive  calls  would  lead  to  an  exponentially  growing  number  of  recomputa¬ 
tions. 

Our  program  Pw  for  the  Warshall  algorithm  now  reads  as  follows: 

A  :=  E; 

for  2  =  1  to  M  do 
/  :=  nil; 

forX  =  1  to  N  do 
for  Y  =  1  to  N  do 

if  assoc(pair(X,  Y),  A)  =  nil 
and  assoc(pair(X,  Z),  A)  4  nil 
and  assoc(pair(Z,  Y),A)  4  nil 
then  I  :=cons(pair(pair(X,  Y), 

append(assoc(pair(X,  Z),  A) 

cdr(assoc(pair(Z,  Y),  A))))y 

n 

else  skip 
11 
od 
od; 

A  :=  appendf/,  A) 
od 


If  we  abbreviate  the  specification  (2)  above  by  Ve,  n,  m3'aS(e,  n,  m,  a),  then  Pw, 
considered  as  a  functional  mapping  values  of  E,N,  M  into  a  value  of  A,  realizes 

*ThU  has  also  been  stressed  in  [It]. 
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(2)  in  the  sense  of  section  2,  i.e.  we  can  derive  in  T 
# 

{E  =  e,N  =  n,M  =  =  a}  — »  5(c,  a,  m,  a), 


using  fonnal  induction  on  m. 

4  An  algorithm  for  the  intermediate  value  theorem 

We  have  choosen  this  example  since  it  involves  a  treatment  of  some  basic 
concepts  from  constructive  analysis  in  the  style  of  Bishop  [1],  and  also  makes 
use  of  the  fact  that  we  have  a  higher  order  language  available. 

Let  J  be  a  monotonic,  uniformly  continuous  real  function  on  [0, 1]  such 
that  J (0)  <  0  <7(1).  To  simplify  matters  we  assume  that/  happens  to  map 
rationals  into  rationals;  so  J  can  be  restricted  to  a  function  / :  Qn  [0, 1]  — *  Q. 
We  want  to  construct  a  real  a  such  that /(a)  =  0.  To  "thieve  this,  we  construct 
sequences  an,  bn  of  rationals  such  that 


/(a.)<0</(6„) 


0  =  oo  <  a.  <  aw+i  <  <  bH  <  b0  =  1 

Then  obviously  (a*)  is  a  Cauchy  sequence  with  Cauchy  modulus  M(e)  :=  least 
m  such  that  ^  <  «.  Hence  we  have  a  real  which  is  a  zero  of  J . 

We  now  want  to  write  out  the  corresponding  programs  and  the  specifications 
they  are  supposed  to  satisfy.  To  do  this  we  have  to  be  somewhat  more  explicit. 

Let  Q  :=  (nat  x  bool)  x  nat  be  the  type  of  rationals;  for  a  :  Q,  we  write  a  6  Q 
to  mean  a  J  J.  A  snd(a)  ^  0  and  a  6  Q*  to  mean  a  €  Q  A  snd(fst(a))  =  true. 
Assume 

/(0),/(l)^±  A /(0)<0</(l)  (1) 

Va,  b  6  Q  n  [0,  Wia),f{b)  ^  1  A  a  <  b  — *  /(a)  <  f(b)  e  Q)  (2) 
Vtefl’WO/i^wWeg*)  (3) 

V«  €  b  e  gn[0, 1]M«)  f  ±A f{a),m  ¥  -LA|a-b|  <  w(«)  \f{a)-m\  <  e) 

(4) 

To  construct  the  sequences  a*,b*  we  use  the  program  P„q. 
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A  :=0; 

B:=  1; 

for  Y=  1  to  /V  do 
C.=  A  +  \(B~A)\ 
Z  :=  app (F,  C); 
if  0  <  Z 
then  B  :=C 
else  A  :=C 
fi 
od 


Then  we  can  derive,  using  induction  on  n, 

{/?  =  *}P„f  {A  =  B  =  b*}  A  a„,  bH  i  -L  A  (1, 2)  -* 

/(«.),/(&.)  J  -LA 
/(a.)<0</(ftB)A 
b„  -  a.  =  yA 
0  <  a.  <  B,  <  1 

So  in  this  sense  computes  a  zero  for  J. 

Furthermore,  we  may  use  /*««  to  obtain,  given  e  g  Q* ,  an  approximation 
a.  to  this  zero  of  J  such  that  <  e.To  achieve  this,  we  first  construct  a 

program  which,  given  eg  Q*,  computes  an  n  such  that  £  <  w(e): 

Delta  :=  appfOmega,  Epsilon); 

Q  :=snd(Delta); 

P  :=fst(fet(Delta)); 
for  X  ■  0  to  Q  do 
\t  Q<2X*P 
then  N  :=X 
else  skip 
od 


Then  for  we  can  derive 
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{Epsilon  =  e,  Omega  =  w}Pmi»{N  -  n} 

A«€Q* 

A  w(e)  f  X 
A  (3) 

-*  nJL 

A  £  <  w(e) 

Finally  we  have,  as  required, 

{F  =/,  Epsilon  =  e,  Omega  =  P„,{A  =a,B-b } 

Ae€g* 

A  u>(e)  y  X 
A  a,b  f  A. 

A  (1-4) 

-  lAa)|<«- 
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